# HeadDriven Phrase Structure Grammar

The handbook

Edited by Stefan Müller Anne Abeillé Robert D. Borsley Jean- Pierre Koenig

Empirically Oriented Theoretical Morphology and Syntax 9

# Em pir i cal ly Ori ent ed The o ret i cal Mor phol o gy and Syn tax

Chief Editor: Stefan Müller Consulting Editors: Berthold Crysmann, Laura Kallmeyer

In this series:


# HeadDriven Phrase Structure Grammar

The handbook

Edited by Stefan Müller Anne Abeillé Robert D. Borsley Jean- Pierre Koenig

Müller, Stefan, Anne Abeillé, Robert D. Borsley & Jean- Pierre Koenig (eds.). 2021. *Head-Driven Phrase Structure Grammar*: *The handbook* (Empirically Oriented Theoretical Morphology and Syntax 9). Berlin: Language Science Press.

This title can be downloaded at: http://langsci-press.org/catalog/book/259 © 2021, the authors Published under the Creative Commons Attribution 4.0 Licence (CC BY 4.0): http://creativecommons.org/licenses/by/4.0/ ISBN: 978-3-96110-255-6 (Digital) 978-3-98554-999-3 (Hardcover)

ISSN: 2366-3529 DOI: 10.5281/zenodo.5543318 Source code available from www.github.com/langsci/259 Collaborative reading: paperhive.org/documents/remote?type=langsci&id=259

Cover and concept of design: Ulrike Harbort Typesetting: Stefan Müller, Elizabeth Pankratz Proofreading: Elizabeth Pankratz Fonts: Libertinus, Arimo, DejaVu Sans Mono Typesetting software: XƎLATEX

Language Science Press xHain Grünberger Str. 16 10243 Berlin, Germany langsci-press.org

Storage and cataloguing done by FU Berlin

# **Contents**


### Contents


Contents


# **Preface**

Head-driven Phrase Structure Grammar (HPSG) is a declarative (or, as is often said, constraint-based) monostratal approach to grammar which dates back to early 1985, when Carl Pollard presented his Lectures on HPSG. It was developed initially in joint work by Pollard and Ivan Sag, but many other people have made important contributions to its development over the decades. It provides a framework for the formulation and implementation of natural language grammars which are (i) linguistically motivated, (ii) formally explicit, and (iii) computationally tractable. From the very beginning it has involved both theoretical and computational work seeking both to address the theoretical concerns of linguists and the practical issues involved in building a useful natural language processing system.

HPSG is an eclectic framework which has drawn ideas from the earlier Generalized Phrase Structure Grammar (GPSG, Gazdar et al. 1985), Categorial Grammar (Ajdukiewicz 1935), and Lexical-Functional Grammar (LFG, Bresnan 1982), among others. It has naturally evolved over the decades. Thus, the constructionbased version of HPSG, which emerged in the mid-1990s (Sag 1997; Ginzburg & Sag 2000), differs from earlier work (Pollard & Sag 1987; 1994) in employing complex hierarchies of phrase types or constructions. Similarly, the more recent Sign-Based Construction Grammar approach differs from earlier versions of HPSG in making a distinction between signs and constructions and using it to make a number of simplifications (Sag 2012).

Over the years, there have been groups of HPSG researchers in many locations engaged in both descriptive and theoretical work and often in building HPSG-based computational systems. There have also been various research and teaching networks, and an annual conference since 1993. The result of this work is a rich and varied body of research focusing on a variety of languages and offering a variety of insights. The present volume seeks to provide a picture of where HPSG is today. It begins with a number of introductory chapters dealing with various general issues. These are followed by chapters outlining HPSG ideas about some of the most important syntactic phenomena. Next are a series of chapters on other levels of description, and then chapters on other areas of

### Preface

linguistics. A final group of chapters considers the relation between HPSG and other theoretical frameworks.

It should be noted that for various reasons not all areas of HPSG research are covered in the handbook (e.g., phonology). So, the fact that a particular topic is not addressed in the handbook should not be interpreted as an absence of research on the topic. Readers interested in such topics can refer to the HPSG online bibliography maintained at the Humboldt Universität zu Berlin.<sup>1</sup>

All chapters were reviewed by one author and at least one of the editors. All chapters were reviewed by Stefan Müller. Jean-Pierre Koenig and Stefan Müller did a final round of reading all papers and checked for consistency and crosslinking between the chapters.

# **Open access**

Many authors of this handbook have previously been involved in several other handbook projects (some that cover various aspects of HPSG), and by now there are at least five handbook articles on HPSG available. But the editors felt that writing one authoritative resource describing the framework and being available free of charge to everybody was an important service to the linguistic community. We hence decided to publish the book open access with Language Science Press.

# **Open source**

Since the book is freely available and no commercial interests stand in the way of openness, the LATEX source code of the book can be made available as well. We put all relevant files on GitHub,<sup>2</sup> and we hope that they may serve as a role model for future publications of HPSG papers. Additionally, every single item in the bibliographies was checked by hand either by Stefan Müller or by one of his student assistants. We checked authors and editors; made sure first name information was complete; corrected page numbers; removed duplicate entries; added DOIs and URLs where appropriate; and added series and number information as applicable for books, book chapters, and journal issues. The result is a resource containing 2623 bibliography entries. These can be downloaded as a single readable PDF file or as BIBTEX file from https://github.com/langsci/hpsg-handbook-bib.

<sup>1</sup>https://hpsg.hu-berlin.de/HPSG-Bib/, 2021-04-29.

<sup>2</sup>https://www.github.com/langsci/259, 2021-04-29.

# **Acknowledgments**

We thank all the authors for their great contributions to the book, and for reviewing chapters and chapter outlines of the other authors. We thank Frank Richter, Bob Levine, and Roland Schäfer for discussion of points related to the handbook, and Elizabeth Pankratz for extremely careful proofreading and help with typesetting issues. We also thank Elisabeth Eberle and Luisa Kalvelage for doing bibliographies and typesetting trees of several chapters and for converting a complicated chapter from Word into LATEX.

We thank Sebastian Nordhoff and Felix Kopecky for constant support regarding LATEX issues, both for the book project overall and for individual authors. Felix implemented a new LATEX class for typesetting AVMs, langsci-avm, which was used for typesetting this book. It is compatible with more modern font management systems and with the forest package, which is used for most of the trees in this book.

We thank Sašo Živanović for writing and maintaining the forest package and for help specifying particular styles with very advanced features. His package turned typesetting trees from a nightmare into pure fun! To make the handling of this large book possible, Stefan Müller asked Sašo for help with externalization of forest trees, which led to the development of the memoize package. The HPSG handbook and other book projects by Stefan were an ideal testing ground for externalization of tikz pictures. Stefan wants to thank Sašo for the intense collaboration that led to a package of great value for everybody living in the woods.

# **Abbreviations and feature names used in the book**


### Preface



Preface




# **References**


Sag, Ivan A. 2012. Sign-Based Construction Grammar: An informal synopsis. In Hans C. Boas & Ivan A. Sag (eds.), *Sign-Based Construction Grammar* (CSLI Lecture Notes 193), 69–202. Stanford, CA: CSLI Publications.

# **Part I Introduction**

# **Chapter 1**

# **Basic properties and elements**

# Anne Abeillé

Université de Paris

# Robert D. Borsley

University of Essex and Bangor University

Head-Driven Phrase Structure Grammar (HPSG) is a declarative and monostratal version of Generative Grammar, in which linguistic expressions have a single relatively simple constituent structure. It seeks to develop detailed formal analyses using a system of types, features, and constraints. Constraints on types of *lexicalsign* are central to the lexicon of a language and constraints on types of *phrase* are at the heart of the syntax, and both lexical and phrasal types include semantic and phonological information. Different versions of the framework have been developed, including versions in which constituent order is a reflection not of constituent structure but of a separate system of order domains, and the Sign-Based Construction Grammar version, which makes a fundamental distinction between signs of various kinds and the constructions which license them.

# **1 Introduction**

Head-Driven Phrase Structure Grammar (HPSG) dates back to early 1985 when Carl Pollard presented his *Lectures on HPSG*. It was often seen in the early days as a revised version of the earlier Generalised Phrase Structure Grammar (GPSG) framework (Gazdar, Klein, Pullum & Sag 1985), but it was also influenced by Categorial Grammar (Ajdukiewicz 1935; Steedman 2000), and, as Pollard & Sag (1987: 1) emphasised, by other frameworks like Lexical-Functional Grammar (LFG; Bresnan 1982), as well. Naturally it has changed in various ways over the decades. This is discussed in much more detail in the next chapter (Flickinger, Pollard

Anne Abeillé & Robert D. Borsley. 2021. Basic properties and elements. In Stefan Müller, Anne Abeillé, Robert D. Borsley & Jean- Pierre Koenig (eds.), *Head-Driven Phrase Structure Grammar: The handbook*, 3–45. Berlin: Language Science Press. DOI: 10.5281/zenodo.5599818

#### Anne Abeillé & Robert D. Borsley

& Wasow 2021), but it makes sense here to distinguish three versions of HPSG. Firstly, there is what might be called early HPSG, the framework presented in Pollard & Sag (1987) and Pollard & Sag (1994). <sup>1</sup> This has most of the properties of more recent versions but only exploits the analytic potential of type hierarchies to a limited degree (Flickinger 1987; Flickinger, Pollard & Wasow 1985). Next there is what is sometimes called Constructional HPSG, the framework adopted in Sag (1997), Ginzburg & Sag (2000), and much other work. Unlike earlier work this uses a rich hierarchy of phrase-types. This is why it is called constructional.<sup>2</sup> Finally, in the 2000s, Sag developed a version of HPSG called *Sign-Based Construction Grammar* (SBCG; Sag 2012). The fact that this approach has a new name suggests that it is very different from earlier work, but probably most researchers in HPSG would see it as a version of HPSG, and it was identified as such in Sag (2010: 486). Its central feature is the special status it assigns to constructions. In earlier work, they are just types of sign, but for SBCG, signs and constructions are quite different objects. In spite of this difference, most analyses in Constructional HPSG could probably be translated into SBCG and vice versa. In this chapter we will concentrate on the ideas of Constructional HPSG, which is probably the version of the framework that has been most widely assumed. We will comment briefly on SBCG in the penultimate section.

The chapter is organised as follows. In Section 2, we set out the properties that characterise the approach and the assumptions it makes about the nature of linguistic analyses and the conduct of linguistic research. Then, in Section 3, we consider the main elements of HPSG analyses: types, features, and constraints. In Section 4, we look more closely at the HPSG approach to the lexicon, and in Section 5, we outline the basics of the HPSG approach to syntax. In Section 6, we look at some further syntactic structures, and in Section 7, we consider some further topics, including SBCG. Finally, in Section 8, we summarise the chapter.

# **2 Properties**

Perhaps the first thing to say about HPSG is that it is a form of Generative Grammar in the sense of Chomsky (1965: 4). This means that it seeks to develop precise and explicit analyses of grammatical phenomena. But unlike many versions of Generative Grammar, it is a declarative or constraint-based approach to gram-

<sup>1</sup>As discussed in Richter (2021), Chapter 3 of this volume, the approaches that are developed in these two books have rather different formal foundations. However, they propose broadly similar syntactic analyses, and for this reason it seems reasonable to group them together as early HPSG.

<sup>2</sup>As discussed below, HPSG has always assumed a rich hierarchy of lexical types. One might argue, therefore, that it has always been constructional.

#### 1 Basic properties and elements

mar, belonging to what Pullum & Scholz (2001) call "Model Theoretic Syntax". As such, it assumes that a linguistic analysis involves a set of constraints to which linguistic objects must conform, and that a linguistic object is well-formed if and only if it conforms to all relevant constraints.<sup>3</sup> This includes linguistic objects of all kinds: words, phrases, phonological segments, and so on. There are no procedures constructing representations such as the phrase structure and transformational rules of classical Transformational Grammar or the Merge and Agree operations of Minimalism. Of course, speakers and hearers do construct representations and must have procedures that enable them to do so, but this is a matter of performance, and there is no need to think that the knowledge that is used in performance has a procedural character. Rather, the fact that it is used in both production and comprehension (and other activities, e.g. translation) suggests that it should be neutral between the two and hence declarative. For further discussion of the issues, see e.g. Pullum & Scholz (2001), Postal (2003), Sag & Wasow (2011; 2015), and Wasow (2021), Chapter 24 of this volume.

HPSG is also a monostratal approach, which assumes that linguistic expressions have a single constituent structure. This makes it quite different from Transformational Grammar, in which an expression can have a number of constituent structures. It means, among other things, that there is no possibility of saying that an expression occupies one position at one level of structure and another position at another level. Hence, HPSG has nothing like the movement processes of Transformational Grammar. The relations that are attributed to movement in transformational work are captured by constraints that require certain features to have the same value. For example, as discussed in Section 4, a raising sentence is one with a verb which has the same value for the feature SUBJ(ECT) as its complement and hence combines with whatever kind of subject its complement requires.

HPSG is sometimes described as a concrete approach to syntax. This description refers not only to the fact that it assumes a single constituent structure, but also to the fact that this structure is relatively simple, especially compared with the structures that are postulated within Minimalism. Unlike Minimalism, HPSG does not assume that all branching is binary. This inevitably leads to simpler, flatter structures. Also unlike Minimalism, it makes limited use of phonologically empty elements. For example, it is not assumed, as in Minimalism, that because some clauses contain a complementiser they all do, an empty one if not an overt one. Similarly, it is not assumed that because some languages like English have

<sup>3</sup> In most HPSG work, all constraints are equal. Hence, there is no possibility – as there is in Optimality Theory (Prince & Smolensky 2004) – of violating one if it is the only way to satisfy another more important one (Malouf 2003). However, see Müller & Kasper (2000) and Oepen et al. (2004) for an HPSG parser with probabilities or weighted constraints.

Anne Abeillé & Robert D. Borsley

determiners, they all do, overt or covert. It is also not generally assumed that null subject sentences, such as (1b) from Polish, have a phonologically empty subject in their constituent structure. Thus, the constituent structure of the two following sentences is quite different, even if their semantics are similar:

	- b. Czytałem read.PST.1SG książkę. book.ACC 'I read a book.'

It is also assumed in much HPSG work that there are no phonologically empty elements in the constituent structure of an unbounded dependency construction such as the following:

(2) What did you say?

On this view, the verb *say* in (2) does not have an empty complement. There is, however, some debate here (Sag & Fodor 1995; Müller 2004; Borsley & Crysmann 2021: Section 3, Chapter 13 of this volume).

A further important feature of HPSG is a rejection of the Chomskyan idea that grammatical phenomena can be divided into a core, which merits serious investigation, and a periphery, which can be safely ignored.<sup>4</sup> This means that HPSG is not only concerned with such "core" phenomena as *wh*-interrogatives, relative clauses, and passives, but also with more "peripheral" phenomena such as the following:

	- b. The more I read, the more I understand.
	- c. Chris lied his way into the meeting.

These exemplify the nominal extraposition construction (Michaelis & Lambrecht 1996), the comparative correlative construction (Abeillé 2006; Abeillé & Borsley 2008; Borsley 2011), and the *X's Way* construction (Sag 2012: Section 7.4). As we will see, HPSG is an approach which is able to accommodate broad linguistic generalisations, highly idiosyncratic facts, and everything in between.<sup>5</sup>

Another notable feature of the framework since the earliest work is a concern with semantics as well as syntax. More generally, HPSG does not try to reduce

<sup>4</sup>This is not to deny that some constructions are more canonical and more frequent in use than others and that this may be important in various ways.

<sup>5</sup> Idioms have also been an important focus of research in HPSG. See e.g. Sag (2007: Section 5.4), Richter & Sailer (2009), Kay & Michaelis (2017), and Sailer (2021), Chapter 17 of this volume.

#### 1 Basic properties and elements

either semantics or morphology to syntax (see Crysmann 2021, Chapter 21 of this volume on morphology in HPSG and Koenig & Richter 2021, Chapter 22 of this volume on semantics). We will comment further on this in the following sections.

We turn now to some assumptions which are more about the conduct of linguistic research than the nature of linguistic analyses. Firstly, HPSG emphasises the importance of firm empirical foundations and detailed formal analyses of the kind advocated by Chomsky in *Syntactic Structures* (Chomsky 1957: 5). Whereas transformational work typically offers sketches of analyses which might be fleshed out one day, HPSG commonly provides detailed analyses which can be set out in an appendix. A notable example is Ginzburg & Sag (2000), which sets out its analysis of English interrogatives in a fifty-page appendix. Arguably, one can only be fully confident that a complex analysis works if it is incorporated into a computer implementation. Hence, computer implementations of HPSG analyses are also quite common (see e.g. Müller 1996; 2015; Copestake 2002; Bender et al. 2010; Bender 2016, and Bender & Emerson 2021, Chapter 25 of this volume).

Another property of the framework is a rejection of abstract analyses with tenuous links to the observable data. As we noted above, phonologically empty elements are only assumed if there is compelling evidence for them.<sup>6</sup> Similarly, overt elements are only assumed to have properties for which there is clear evidence. For example, words are only assumed to have case or agreement features if there is some concrete morphological evidence for them, as in Polish, illustrated in (1b). This feature of HPSG stems largely from considerations about acquisition (Müller 2016: Chapter 19; Borsley & Müller 2021: Section 5.2, Chapter 28 of this volume). Every postulated element or property for which there is no clear evidence in the data increases the complexity of the acquisition task and hence necessitates more complex innate machinery. This suggests that such elements and properties should be avoided as much as possible. It has important implications both for the analysis of individual languages and for how differences between languages are viewed.

A related property of the framework is a rejection of the idea that it is reasonable to assume that a language has some element or property if some other languages do. Many languages have case and many languages have agreement, but for HPSG, it does not follow that they all do. As Müller (2015: 25) puts it, "Grammars should be motivated on a language-specific basis." Does this mean that other languages are irrelevant when one investigates a specific language?

<sup>6</sup>There may be compelling evidence for some empty elements in some languages. For example, Borsley (2009: Section 8) argues that Welsh has phonologically empty pronouns. For general discussion of empty elements, see Müller (2016: Chapter 19.2).

Clearly not. As Müller also states, "In situations where more than one analysis would be compatible with a given dataset for language X, the evidence from language Y with similar constructs is most welcome and can be used as evidence in favour of one of the two analyses for language X" (Müller 2015: 43).

# **3 Elements**

For HPSG, a linguistic analysis is a system of types (or sorts), features, and constraints. Types provide a complex classification of linguistic objects, features identify their basic properties, and constraints impose further restrictions. In this section, we will explain these three elements. We note at the outset that HPSG distinguishes between the linguistic objects (lexemes, words phrases, etc.) and descriptions of such objects. Linguistic objects must have all the properties of their description and cannot be underspecified in any way.<sup>7</sup> Descriptions, in contrast, can be underspecified and, in fact, always are.

There are many different kinds of types, but particularly important is the type *sign* and its various subtypes. For Ginzburg & Sag (2000: 19), this type has the subtypes *lexical-sign* and *phrase*, and *lexical-sign* has the subtypes *lexeme* and *word*. (Types are written in lower case italics.) Thus, we have the type hierarchy in Figure 1.

Figure 1: A hierarchy of types of signs

*lexeme*, *word*, and *phrase* have a complex system of subtypes. The type *lexicalsign*, its subtypes, and the constraints on them are central to the lexicon of a language, while the type *phrase*, its subtypes, and the constraints on them are at the heart of the syntax. In both cases, complex hierarchies mean that the framework is able to deal with broad, general facts, very idiosyncratic facts, and facts somewhere in between. We will say more about this below.

Signs are obviously complex objects with (at least) phonological, syntactic, and semantic properties. Hence, the type *sign* must have features that encode these

<sup>7</sup>As pointed out by Pollard & Sag (1987: Chapter 2), HPSG grammars provide descriptions for models of linguistic objects rather than for linguistic objects per se. See also Richter (2021), Chapter 3 of this volume for a detailed discussion of the formal background of HPSG.

1 Basic properties and elements

properties. For much work in HPSG, phonological properties are encoded as the value of a feature PHON(OLOGY), whose value is a list of objects of type *phon*, while syntactic and semantic properties are grouped together as the value of a feature SYNSEM, whose value is an object of type *synsem*. (Features or attributes are written in small caps.) A type has certain features associated with it, and each feature has a value of some kind. A bundle of features can be represented by an attribute-value matrix (AVM) with the type name at the top on the left hand side and the features below followed by their values. Thus, signs can be described as follows:

$$\begin{array}{c} \begin{array}{c} \text{sign} \\ \text{^{\text{SHON}}} \\ \text{^{\text{SYNSEM}}} \text{\text{^{\text{SYEN}}}} \end{array} \begin{array}{c} \text{^{\text{'}}} \\ \text{^{\text{'}}} \text{^{\text{Y}}} \text{\text{^{\text{'}}}} \end{array} \begin{array}{c} \text{^{\text{'}}} \\ \text{^{\text{'}}} \text{^{\text{'}}} \text{^{\text{'}}} \end{array} \end{array}$$

The descriptions of specific signs will obviously have specific values for the two features. For example, we might have the following simplified AVM for the phrase *the cat*:

$$\begin{array}{c} \left| \begin{array}{c} \text{\$phase\\ \$^{\mathsf{P}}\$\text{HON}} \\ \text{\$^{\mathsf{SYNEM NP}}\$} \end{array} \right| \\\ \text{\$^{\mathsf{SYNEM NP}}\$} \end{array} \right| $$

Here, following a widespread practice, we use standard orthography instead of real *phon* objects,<sup>8</sup> and we use the traditional label NP as an abbreviation for the relevant *synsem* object. We will say more about *synsem* objects shortly. First, however, we must say something about phrases.

A central feature of phrases is that they have internal constituents. More precisely, they have daughters, i.e. immediate constituents, one of which may be the head. This information is encoded by further features, for Ginzburg & Sag (2000: 29) the features DAUGHTERS (DTRS) and HEAD-DAUGHTER (HD-DTR). The value of the latter is a *sign*, and the value of the former is a list of *signs*, which includes the value of the latter.<sup>9</sup> Thus, phrases take the form in (6a), and headed phrases the form in (6b):

<sup>8</sup>See Bird & Klein (1994), Höhle (1999), and Walther (1999) for detailed approaches to phonology and structured PHON values, and De Kuthy (2021), Chapter 23 of this volume and Abeillé & Chaves (2021: 762–763), Chapter 16 of this volume for reference to structured PHON values.

<sup>9</sup>Some HPSG work, e.g. Sag (1997), has a HEAD-DAUGHTER feature and a NON-HEAD-DAUGHTERS feature, and the value of the former is not part of the value of the latter.

The sign that is the value of HEAD-DTR can be a word or a phrase. Within Minimalism, the term *head* is only applied to words. On this usage, the value of HEAD-DTR is either the head or a phrase containing the head. But there are good reasons for not adopting this usage, for example the fact that the head can be an unheaded phrase: for example, a coordination (see Abeillé & Chaves 2021: Section 2, Chapter 16 of this volume). So we will say that the value of HD-DTR is the head. See Jackendoff (1977: 30) for an early discussion of the term.

Anne Abeillé & Robert D. Borsley

$$\begin{array}{ll} \text{(6)} & \text{a.} \begin{bmatrix} \textit{phras} \\ \textit{PHON} & \textit{list(phen)} \\ \textit{SYNSEM} & \textit{synsem} \\ \textit{DTRS} & \textit{list(sign)} \end{bmatrix} \\ & \begin{bmatrix} \textit{v} \\ \textit{DTRS} \end{bmatrix} \\ & \begin{array}{ll} \textit{b.} & \textit{SYNSEM} \\ \textit{v} & \textit{f} \textit{stras} \\ \textit{DTR} & \textit{f} \textit{stng} \end{array} \end{array} \qquad \begin{array}{ll} \begin{bmatrix} \textit{headed-phras} \\ \textit{h.} \textit{pHO} \\ \textit{v} \textit{NNSEM} \text{ syn} \textit{s} \textit{m} \textit{s} \\ \textit{b.} \textit{v} \textit{N} \textit{s} \textit{t} \\ \textit{HD-DTR} \text{ sign} \end{bmatrix} \end{array}$$

To take a concrete example, the phrase *the cat* might have the fuller AVM given in (7).

$$\begin{array}{|c|c|c|}
\hline
\text{ } & \text{ } & \text{ } & \text{ } & \text{ } & \text{ } & \text{ } & \text{ } & \text{ } \\
& \text{ } & \text{ } & \text{ } & \text{ } & \text{ } & \text{ } & \text{ } & \text{ } \\
& & \text{ } & \text{SYNSEM NP} & & & & \\
& & \text{DTRS} & & \left\langle \begin{bmatrix} \text{PHON} & \langle \text{the} \rangle \\\\ \text{SYNSEM Det} \end{bmatrix}, \square \begin{bmatrix} \text{PHON} & \langle cat \rangle \\\\ \text{SYNSEM N} & \\\\ \text{T} & \text{T} \end{bmatrix} \right\rangle \\
& & & \text{H} & \text{DTR} \\
& & & & \\
\hline
\end{array}$$

Here, the two instances of the tag 1 indicate that the *sign* which is the second member of the DTRS list is also the value of HD-DTR. Thus, the word *cat* is the head of the phrase *the cat*. An object occupying more than one position in a representation, either as a feature value or as part of a feature value (a member of a list or set), for example 1 in (7), is known as re-entrancy or structure sharing. As we will see below, it is a pervasive feature of HPSG.

Most HPSG work on morphology has assumed a realizational approach, in which there are no morphemes (see Crysmann 2021, Chapter 21 of this volume). Hence, words do not have internal structures in the way that phrases do. However, it is widely assumed that lexemes and words that are derived through a lexical rule have the lexeme from which they are derived as a daughter (see Briscoe & Copestake 1999; Meurers 2001 and Section 4.2 below). Hence, the DTRS feature is relevant to words as well as phrases.

AVMs like (7) can be quite hard to look at. Hence, it is common to use traditional tree diagrams instead. Thus, we might have the tree-like representation in Figure 2 instead of (7). But one should bear in mind that AVMs correspond to (rooted) graphs and provide more detailed descriptions than traditional phrase structure trees, with richer node and edge labels, and with shared feature values between nodes. Thus, at each node, all kinds of information are available: not just syntax but also phonology, semantics, and information structure.<sup>10</sup>

<sup>10</sup>This differs from Lexical Functional Grammar, for instance, which distributes the information between different kinds of structures (see Wechsler & Asudeh 2021, Chapter 30 of this volume).

1 Basic properties and elements

Figure 2: A simple tree for *the cat*

If the head is either obvious or unimportant, the HD-DTR annotation might be omitted. This is a convenient informal notation, but it is important to remember that it is just that and has no status within the theory.

We return now to *synsem* objects. Standardly, these have two features: LOCAL, whose value is a *local* object, and NONLOCAL, which we will deal with in Section 5. A *local* object has the features CAT(EGORY) and CONT(ENT), whose values are objects of type *category* and *content*, respectively, and the feature CONTEXT. <sup>11</sup> In much work, a *category* object has the features, HEAD, SUBJ, and COMP(LEMENT)S. HEAD takes as its value a *part-of-speech* object, while SUBJ and COMPS have a list of *synsem* objects as their value. The former indicates what sort of subject a sign requires, and the latter indicates what complements it takes. In both cases, the value is the empty list if nothing is required. It is generally assumed that the SUBJ list never has more than one member. SUBJ and COMPS are often called *valence* features. Thus, the following AVM provides a fuller representation of signs:

<sup>11</sup>Words also have a MORPH (or INFL) attribute that we ignore here (see Crysmann 2021, Chapter 21 of this volume).

Anne Abeillé & Robert D. Borsley

The type *part-of-speech* has subtypes such as *noun*, *verb*, and *adjective*. In other words, we have a type hierarchy of the form given in Figure 3.

Figure 3: A hierarchy for part of speech

The type hierarchy in Figure 1 can be viewed as an ontology of possible objects in the language. A particular word or phrase must instantiate one of the maximal (most specific) types and have the properties specified for it and all its supertypes.<sup>12</sup> We might have a *synsem* object of the following form for the phrase *the cat*:

$$\text{(9)} \quad \begin{bmatrix} \text{symsem} \\ \begin{bmatrix} \text{local} \\ \text{CATEGORY} \\ \text{CATEGORY} \\ \text{SUBJ} \\ \text{COMPS} \end{bmatrix} \begin{bmatrix} \text{category} \\ \text{HEAD} \\ \text{SUBJ} \\ \text{COMPS} \begin{pmatrix} \end{pmatrix} \end{pmatrix} \end{bmatrix} \begin{bmatrix} \\ \text{MEAAD} \\ \text{SUBJ} \\ \text{COMPS} \begin{pmatrix} \text{j} \\ \end{pmatrix} \end{bmatrix} \begin{bmatrix} \\ \text{0} \end{bmatrix} $$
 
$$\begin{bmatrix} \text{NONLOCK} \\ \text{NONLOCK} \dots \end{bmatrix}$$

This ignores a number of matters including the value of CONTENT, CONTEXT, and NONLOCAL. It also ignores the fact that the type *noun* will have certain features, for example CASE, but it highlights some important aspects of HPSG analyses. Notice that (9) is compatible with the SYNSEM feature in (8): it contains more specific information, such as [HEAD *noun*], but no conflicting information: hi is the empty list and is compatible with *list*(*synsem*).

Rather different from most of the features mentioned above are fairly traditional features like PERSON, NUMBER, GENDER, and CASE. In most HPSG work, these have as their value an atomic type: a type with no features. A simple treatment of person might have the types *first*, *second*, and *third*, and a simple treatment of number the types *sg* (*singular*) and *pl* (*plural*).<sup>13</sup> There are also Boolean

<sup>12</sup>AVMs associated with types used to be combined by unification (Pollard & Sag 1987: Chapter 2). See Richter (2021: 90–91), Chapter 3 of this volume for discussion of the term "unification".

<sup>13</sup>In practice, a more complex system of values may well be appropriate (Flickinger 2000: Section 3).

1 Basic properties and elements

features with + and − as their values. An example is AUX, used to distinguish auxiliary verbs ([AUX +]) from non-auxiliary verbs ([AUX −]).<sup>14</sup>

As the preceding discussion makes clear, features in HPSG can have a number of kinds of value. They may have an atomic type (PERSON, NUMBER, GENDER, CASE, AUX), a feature structure (SYNSEM, LOCAL, CATEGORY, etc.), or a list of some kind (SUBJ, COMPS).<sup>15</sup> As we will see in Section 5, HPSG also assumes features with a set as their value.

The CONTENT feature, whose value is a *content* object, highlights the importance of semantics within HPSG. But what exactly is a *content* object? Different views of semantics have been taken within the HPSG literature. Much HPSG work has assumed some version of Situation Semantics (Barwise & Perry 1983). But some work has employed so-called Minimal Recursion Semantics (Copestake, Flickinger, Pollard & Sag 2005), while others use Lexical Resource Semantics (Richter & Sailer 2004). Sag (2010: 501) adopts a conventional, Montague-style possible-worlds semantics (Montague 1974) in his analysis of English filler-gap constructions, and SBCG (Section 7.2) has generally employed a version of Frame Semantics. See Koenig & Richter (2021), Chapter 22 of this volume for a discussion of the issues.

Finally, the CONTEXT feature is used for information structure, deixis, and, more generally, pragmatics (see De Kuthy 2021, Chapter 23 of this volume).

We will say more about types and features in the following sections. We turn now to constraints. These are the machinery which imposes conditions on linguistic objects by saying that if an object has some property or properties, it must have some other property or properties. Constraints take the following form:<sup>16</sup>

(10) X ⇒ Y

Commonly, X is a type and Y a feature description, and this is the case in all the constraints that we discuss below. However, X may also be a feature description with or without an associated type. This is necessary, for example, in the constraints that constitute Binding Theory (see Müller 2021a, Chapter 20 of this volume). Here is a very simple constraint:

<sup>14</sup>In some recent work, e.g. Sag (2012: 157–162) and Sag et al. (2020), the feature is used to distinguish positions that only allow an auxiliary from positions that allow any verb. Within this approach, auxiliaries (except support *do*) are unspecified for AUX, since they may appear in both [AUX +] and [AUX –] constructions. Non-auxiliary verbs are [AUX –]; see Abeillé (2021: Section 4), Chapter 12 of this volume.

<sup>15</sup>A list can be represented as a feature description with the features FIRST and REST, where the value of FIRST is the first element of the list. See Richter (2021: 102), Chapter 3 of this volume for more on the encoding of lists.

<sup>16</sup>The double-shafted arrow ⇒ is used in implicational constraints, and a single shafted arrow ↦→ in lexical rules.

Anne Abeillé & Robert D. Borsley

### (11) *phrase* ⇒ [COMPS hi]

This says that a phrase has the empty list (hi) as the value of the COMPS feature, which means that it does not require any complements.<sup>17</sup> As we will see below, most constraints are more complex than (11) and impose a number of restrictions on certain objects. For this reason, one might speak of a set of constraints. However, we will continue to use the term "constraint" for objects of the form in (10), no matter how many restrictions are imposed. Particularly important are constraints dealing with the internal structure of various types of phrases. We will consider some constraints of this kind in Section 5.

In most HPSG work, some shortcuts are used to abbreviate a feature path; for example, in (11), COMPS stands for SYNSEM|LOC|CAT|COMPS. We use this practice in the rest of the chapter, and it is used throughout the Handbook.

# **4 The lexicon**

As noted above, the type *lexical-sign*, its subtypes, and the constraints on them are central to the lexicon of a language and the words it licenses.<sup>18</sup> Lexical rules are also important. Some of the earliest work in HPSG focused on the organisation of the lexicon and the question of how lexical generalisations can be captured, and detailed proposals have been developed.<sup>19</sup>

# **4.1 Lexemes and words**

In some frameworks, the lexicon contains not lexemes but morphemes, i.e. roots and affixes of various kinds. But most work in HPSG has assumed a realizational approach to morphology. Within this approach, there are no morphemes, just lexemes and the words that realise them, and affixes are just bits of phonology realising certain morphosyntactic features (Stump 2001; Anderson 1992). One consequence of this is that HPSG has no syntactic elements like the T(ense) and Num(ber) functional heads of Minimalism, which are mainly realised by affixes.

<sup>17</sup>The constraint in (11) is plausible for English, but it is too strong for some languages, especially for languages with complex predicates or partial VPs (see Godard & Samvelian 2021, Chapter 11 of this volume), and also for SOV languages if they are analysed in terms of binary branching (see Müller 2021b, Chapter 10 of this volume).

<sup>18</sup>Other types of constraint are relevant to the form of lexemes and words, e.g. constraints on *synsem* objects and on PHON values. These are also relevant to the form of phrases.

<sup>19</sup>The lexicon is more important in HPSG than in some other constructional approaches, e.g. that of Goldberg (1995; 2006). See Müller & Wechsler (2014) and Müller (2021c: Section 2), Chapter 32 of this volume for discussion.

1 Basic properties and elements

See Crysmann (2021: Section 3), Chapter 21 of this volume, Davis & Koenig (2021: Section 2), Chapter 4 of this volume, and Borsley & Müller (2021: Section 4.1.3), Chapter 28 of this volume for discussion.

Probably the most important properties of any lexeme are its part of speech and its combinatorial properties. As we saw in the last section, the HEAD feature encodes part of speech information, while the SUBJ and COMPS features encode combinatorial information. As we also noted in the last section, HEAD takes as its value a *part-of-speech* object, and the type *part-of-speech* has subtypes such as *noun*, *verb*, and *adjective*. At least some of the subtypes have certain features. For example, in many languages, the type *noun* has the feature CASE with values like *nom*(*inative*), *acc*(*usative*), and *gen*(*itive*). Thus, nominative pronouns like *I* might have a *part-of-speech* of the form in (12) as its HEAD value.

$$\begin{array}{c} \begin{bmatrix} \textit{noun} \\ \textit{cASE} \textit{nom} \end{bmatrix} \end{array} $$

Similarly, in many languages, the type verb has the feature VFORM with values like *fin*(*ite*) and *inf* (*initive*). Thus, the HEAD value of the word form *be* might be (13).

$$\begin{array}{c} \text{(13)} \quad \begin{bmatrix} \text{verb} \\ \text{vFORM} \text{ inf} \end{bmatrix} \\\end{array}$$

In much the same way, the type *adjective* might have a feature distinguishing between positive, comparative, and superlative forms, in English and many other languages.

We must now say more about combinatorial properties. In much HPSG work, it is assumed that SUBJ and COMPS encode what might be regarded as superficial combinatorial information and that more basic combinatorial information is encoded by a feature ARG(UMENT)-ST(RUCTURE). <sup>20</sup> Normally the value of ARG-ST of a word is the concatenation of the values of SUBJ and COMPS, using ⊕ for list concatenation. In other words, we normally have the following situation (notice the use of re-entrancy or structure sharing):

$$\begin{array}{c} \text{(14)} \quad \begin{bmatrix} \text{SUBJ} & \boxed{\text{I}} \\ \text{COMPS} & \boxed{\text{2}} \\ \text{ARG-ST} & \boxed{\text{1}} \oplus \boxed{\text{2}} \end{bmatrix} \end{array}$$

<sup>20</sup>ARG-ST is also crucial for Binding Theory, which takes the form of a number of constraints on ARG-ST lists. See Müller (2021a), Chapter 20 of this volume.

Anne Abeillé & Robert D. Borsley

As noted earlier, it is generally assumed that the SUBJ list never has more than one member. The appropriate features for the word *read* in (1a), for example, would include the following, where the tags identify not lists but list members:

(15) Lexical item for *read*: SUBJ 1 COMPS 2 ARG-ST <sup>1</sup> NP, <sup>2</sup> NP 

Under some circumstances, however, we have something different. For example, it has been proposed, e.g. in Manning & Sag (1999: 65), that null subject sentences have an element representing the understood subject in the ARG-ST list of the main verb but nothing in the SUBJ list. Thus, the verb *czytałem* 'read' in (1b), repeated here as (16), has the features in (17).


A similar analysis is widely assumed for unbounded dependency gaps. On this analysis, the verb *say* in (2), repeated here as (18), has the features in (19).

(18) What did you say?

(19) Lexical item for *say* with the object extracted: SUBJ <sup>1</sup> NP COMPS hi ARG-ST <sup>1</sup> NP, NP 

It is also assumed that the arguments that are realised as pronominal affixes (traditionally known as clitics in Romance languages) are absent from COMPS lists (Miller & Sag 1997: Section 3; Monachesi 2005), and other differences between SUBJ, COMPS, and ARG-ST have been proposed for other languages (see Manning & Sag 1999, Davis, Koenig & Wechsler 2021: Section 3, Chapter 9 of this volume for discussion). In much work, the relation between ARG-ST, SUBJ, and COMPS is

#### 1 Basic properties and elements

regulated by a constraint called the Argument Realisation Principle (ARP). The following is a simplified version of the constraint proposed in Ginzburg & Sag (2000: 171; see also Bouma et al. 2001: 12):

$$(20)\quad \text{word} \Longrightarrow \begin{bmatrix} \text{SUBJ} & \boxed{} \\ \text{comps} & \boxed{} \ominus list(non-canonical) \\ \text{ARG-ST} & \boxed{} \oplus \boxed{} \end{bmatrix}$$

This ensures that non-canonical arguments, including gaps and arguments realised as affixes, do not appear in COMPS lists.<sup>21</sup> Notice, however, that it says nothing special about subjects.<sup>22</sup> There are complex issues here, and the principle will probably take a different form in different languages. So we will not try decide exactly what form it should take.

A variety of HPSG work assumes the SUBJ and COMPS features, but some work assumes a SPR (SPECIFIER) feature instead of, or in addition to, the SUBJ feature. Where it replaces SUBJ, the idea is that subjects are one of a number of types of specifiers, others being determiners within NPs and degree words like *so* and *too* within APs (Pollard & Sag 1994: 358). Where it is an additional feature, the idea is that there are a number of types of specifier, but subjects are not specifiers. Predicative nominals (e.g. *my cousin* in *Paul is my cousin*) may need both (Pollard & Sag 1994: Section 9.4.1; Ginzburg & Sag 2000: 409; Abeillé & Godard 2003). There are other positions in the HPSG community. Much early work has a single feature called SUBCAT instead of SUBJ and COMPS (Pollard & Sag 1987). Essentially the same position has been adopted within Sign-Based Construction Grammar, which has a single feature called VALENCE instead of SUBJ, SPR, and COMPS. 23 Obviously, there are some important issues here.

It is a central feature of lexical items that part of speech and combinatorial properties are separate matters. Members of the same part of speech can have

<sup>21</sup>As we saw above, the sign ⊕ means concatenation of lists. Ginzburg & Sag (2000: 170) state the following about : "Here ' ' designates a relation of contained list difference. If <sup>2</sup> is an ordering of a set <sup>2</sup> and <sup>1</sup> is a subordering of 2, then <sup>2</sup> <sup>1</sup> designates the list that results from removing all members of <sup>1</sup> from 2; if <sup>1</sup> is not a sublist of 2, then the contained list difference is not defined. For present purposes, is interdefinable with the sequence union operator ( ) of Reape (1994) and Kathol (1995): ( = ) ⇔ ( = )." The operator is called *shuffle* and is also explained in Müller (2021b: 391), Chapter 10 of this volume.

<sup>22</sup>Ginzburg & Sag (2000: 177–183) explicitly allow gaps in SUBJ lists, but this is controversial, as discussed in Borsley & Crysmann (2021: 547–548), Chapter 13 of this volume.

<sup>23</sup>SBCG also has a feature X-ARG, which picks out subjects and other external arguments. But unlike the other features mentioned here, this always has the same value in a head and its mother. Its role is to make information about external arguments available outside the phrases in which they appear. See Sag (2007; 2012: 84, 149–151).

different combinatorial properties, and members of different parts of speech can have the same combinatorial properties. Much HPSG work captures this fact by proposing that the type *lexeme* be cross-classified along two dimensions, one dealing with part of speech information and one dealing with argument selection information (Flickinger 1987: 20). Figure 4 is a simple illustration based on Ginzburg & Sag (2000: 20).

Figure 4: Cross-classification of lexemes

Upper case letters are used for the two dimensions of classification, and *v-lx*, *intr-lx*, *s-rsg-lx*, and *srv-lx* abbreviate *verb-lexeme, intransitive-lexeme, subjectraising-lexeme*, and *subject-raising-verb-lexeme*, respectively. All these types will be subject to specific constraints. For example, *v-lx* will be subject to something like the following constraint, based on that in Ginzburg & Sag (2000: 22):

(21) *v-lx* ⇒ HEAD *verb* ARG-ST XP, … 

This says that a verb lexeme has a *verb* part of speech and requires a phrase of some kind as its first (syntactic) argument (corresponding to its subject). Similarly, we will have something like the following constraint for *s-rsg-lx*:

$$\text{(22)}\quad \text{s-rsg-}l\mathbf{x} \Longrightarrow \begin{bmatrix} \text{ARG-ST} \ \left< \Box \ \left[ \text{SUBJ} \ \left< \Box \right> \right] \right> \end{bmatrix} , \dots \rangle \begin{bmatrix} \end{bmatrix}$$

This says that a subject-raising-lexeme has (at least) two (syntactic) arguments, a subject and a complement, and that the subject is whatever the complement requires as a subject, indicated by 1 . Most of the properties of any lexeme will be inherited from its supertypes. Thus, very little information needs to be listed for each specific lexeme, and the richness of the lexical description comes from the classification in a system like this.

#### 1 Basic properties and elements

For example, for a subject-raising verb like *seem*, its CAT and CONTENT features are the following, using a simplified version of Minimal Recursion Semantics (MRS; Copestake et al. 2005): REL(ATION)S is the attribute for the list of elementary predications associated with a word, a lexeme, or a phrase, and SOA is for *state-of-affairs* (see Koenig & Richter 2021, Chapter 22 of this volume). *Seem* takes an infinitival VP complement.<sup>24</sup> Notice that the first syntactic argument (the subject) is not mentioned in the CONTENT, i.e. it is not assigned a semantic role by *seem* (see Abeillé 2021: Section 1, Chapter 12 of this volume).

(23) Constraints on type *seem-lx* in addition to those inherited from *srv-lx*: *seem-lx* ⇒

$$\begin{aligned} & \begin{bmatrix} \text{se} \, \text{em} \, \text{L} & \text{\;} \\ \text{CAT} & \begin{bmatrix} \text{ARG-ST} \left< \begin{bmatrix} \text{I} \end{bmatrix}, \text{VP} \begin{bmatrix} \text{FROM } \, \text{inf} \end{bmatrix} \\ \text{I} & \begin{bmatrix} \text{INDEX} \begin{bmatrix} \text{I} \end{bmatrix} \\ \text{RELL} \begin{bmatrix} \text{s} \\ \text{RELL} \end{bmatrix} \end{bmatrix} \end{bmatrix} \end{\end{\}} \end{aligned} $$

Once these more specific features are combined with features from the type *srvlx*, we get a more complete AVM like the following for the word *seem*:

> 

(24) Constraints for the lexeme *seem*:

$$\begin{array}{|c|c|c|}\hline \text{seem-lx} \\ \hline \\ \text{CAT} & \begin{Bmatrix} \text{SUBJ} & \langle \Pi \rangle \\ \text{COMPS} & \langle \Xi \rangle \end{Bmatrix} \\ \text{ARG-ST} & \begin{Bmatrix} \Pi, \Box \end{Bmatrix} \text{VP} \begin{bmatrix} \text{HEAD } \begin{bmatrix} \text{vFORM } \mathit{inf} \end{bmatrix} \\ \text{ARG-ST} & \begin{Bmatrix} \text{SUBJ} & \langle \Pi \rangle \\ \text{INDEX} & \Box \end{Bmatrix} \end{array} \end{array} \end{bmatrix} \Bigg| \begin{array}{|c|c|}\hline \\ \hline \\ \text{ARG-ST} & \begin{Bmatrix} \text{INDEX} & \Box \end{Bmatrix} \\ \hline \\ \text{CONT} & \begin{Bmatrix} \text{INDEX} & \langle \mathit{b} \mathit{e} \mathit{e} \mathit{m-rel} \rangle \\ \text{son-rel} & \begin{Bmatrix} \text{N} \mathit{e} \mathit{e} \mathit{e} \mathit{\end{Bmatrix}} \\ \hline \end{array} \end{array} \end{array} \right] \Bigg| \begin{array}{|c|c|}\hline \\ \hline \\ \hline \\ \hline \\ \hline \\ \hline \\ \hline \end{array} \end{array}$$

 Notice that the SUBJ value is underspecified. Thus, *seem* combines with an infinitival complement and with any subject (nominal or verbal, expletive or referential), provided this subject is appropriate for the infinitival complement (see Abeillé 2021: Section 2.1, Chapter 12 of this volume):

	- b. \* Kim is/seems to be snowing.

<sup>24</sup>The entry can be modified to allow predicative complements, as well as a second *to* complement (*John seems tired/in a good mood to me*).

#### Anne Abeillé & Robert D. Borsley


# **4.2 Lexical rules**

The hierarchy of lexical types provides one way of capturing lexical generalisations. Lexical rules provide another.<sup>25</sup> They are used in morphology to relate lexemes to words (inflection) and lexemes to lexemes (derivation) (see Crysmann 2021: Section 2, 3, Chapter 21 of this volume). For syntax, they are relevant especially to valence alternations such as that illustrated in the following (see Davis, Koenig & Wechsler 2021: Section 5.3, Chapter 9 of this volume):

	- b. That Sandy was there is unimportant.
	- c. That Lee won impressed everyone.
	- b. It is unimportant that Sandy was there.
	- c. It impressed everyone that Lee won.

These show that verbs and adjectives which allow a clausal subject generally also allow an expletive *it* subject and a clause as an extra complement (Pollard & Sag 1994: 150). The lexemes required for the latter use can be derived from the lexemes required for the former use by a lexical rule of the following form:<sup>26</sup>

(28) [ARG-ST hSi ⊕ 2 ] ↦→ [ARG-ST hNP[*it*]i ⊕ 2 ⊕ hSi]

	- b. It is easy to annoy Lee.

Clauses introduced by *that* are sometimes considered as CPs in HPSG (see Section 7), with verbs and complementisers as two subtypes of *verbal*.

<sup>25</sup>Lexical rules can be seen as a generative device, or alternatively, as a set of well-formedness conditions on the lexicon: if the lexicon contains items with description , it must also contain items with description (Meurers 2001). See also Davis & Koenig (2021: Section 5), Chapter 4 of this volume.

<sup>26</sup>Another representation of lexical rules is an AVM with featuresINPUT and OUTPUT, or with the left hand side as a daughter. As for (27), assuming that both clauses and VPs have a verbal head, it easily extends to infinitival subjects, to accommodate pairs of examples like the following:

1 Basic properties and elements

The active-passive relation can be captured by a similar lexical rule (Flickinger 1987: Section 5.1.1). Since these rules do not change the CONTENT feature, these alternations will preserve the meaning of the verb or adjective lexeme (see Davis & Koenig 2021, Chapter 4 of this volume). Thus, the sentences in (27) will have a different syntactic structure from their counterparts in (26), but may have the same semantic representation (they will probably have different information structures, thus different CONTEXT features; see De Kuthy (2021), Chapter 23 of this volume on information structure).

# **5 Syntax**

As noted above, the type *phrase*, its subtypes, and the constraints on them are at the heart of the syntax of a language.<sup>27</sup> A simple hierarchy of phrase types was assumed in early HPSG, but what we have called Constructional HPSG employs complex hierarchies of phrase types comparable to the complex hierarchies of lexical types employed in the lexicon.

# **5.1 A hierarchy of phrase types**

Like much other work in syntax, HPSG takes from X-bar theory (Jackendoff 1977) the idea that the local trees that make up syntactic structures fall into a limited number of types. Like Jackendoff (1977), and unlike Minimalism, HPSG assumes that not all phrases are headed, even if many are, and does not limit the term *head* to lexical elements. Thus, among phrases there is a basic distinction between nonheaded phrases and headed phrases. There are various kinds of headed phrase. We will consider three here. First there are head-complement phrases: combinations of a head and its complements. These can be headed by various parts of speech – verbs, prepositions, adjectives, nouns, and others – and may have one complement or more than one. Next, there are head-subject phrases. Typically, the head of such a phrase is a VP. However, the bracketed material in the following may well be head-subject phrases with a non-verbal head.

(29) With [Kim ill/in London/a candidate], anything is possible.

Finally, there are head-filler phrases: clauses in which an initial constituent is associated with a gap in the following constituent. *Wh*-interrogatives and *wh*relatives, such as the bracketed material in the following, are typical examples.

<sup>27</sup>As noted in Footnote 18, constraints on *synsem* objects and PHON values are relevant to phrases as they are to lexemes and words.

Anne Abeillé & Robert D. Borsley

(30) a. I'm wondering [who I talked to].

b. This is the official [who I talked to].

All this suggests the simple type hierarchy in Figure 5. Each of these types is associated with a constraint capturing its distinctive properties.

Figure 5: A hierarchy of types of phrases

Consider first the type *headed-ph*. Here we need a constraint capturing what all headed phrases have in common. This is essentially that they have a head, with which they share certain features. But what features? One view is that the main features that are shared are those that are the value of HEAD. This is embodied in the following constraint, which is known as the Head Feature Principle:<sup>28</sup>

$$\text{(31)}\quad headed \text{-}ph \Rightarrow \begin{bmatrix} \text{HEAD } \boxed{\Box} \\ \text{HEAD -DIR } \begin{bmatrix} \text{HEAD } \boxed{\Box} \end{bmatrix} \end{bmatrix}$$

Each of the three subtypes of *headed-ph* is subject to a constraint embodying its distinctive properties. Here is a constraint on the type *hd-comp-ph* (with SYNSEM abbreviated as SS):

(32) *hd-comp-ph* ⇒ HD-DTR 1 *word* COMPS 2 , …, n DTRS 1, - SS 2 , …, - SS n 

This ensures that a head-complement phrase has a word as a head daughter and non-head daughters with the *synsem* properties that appear in the head's COMPS list.<sup>29</sup> Notice that nothing is said about the SYNSEM value of the phrase. It will be

<sup>28</sup>HEAD here is an abbreviation for SYNSEM|LOC|CAT|HEAD. In later implicational constraints, we abbreviate SYNSEM|LOC|CAT|COMPS as COMPS and SYNSEM|LOC|CAT|SUBJ as SUBJ.

<sup>29</sup>The head could be identified as a [LEX +], [LIGHT +], or [WEIGHT *light*] phrase, to accommodate coordination of heads as in *John* [*knows and likes*] *this record* (Abeillé 2006: Section 5.1).

#### 1 Basic properties and elements

[COMPS hi], as required by the constraint in (11), and it will have the same value for HEAD as the head daughter as a consequence of the Head Feature Principle. It must also have the same value for SUBJ as the head daughter. One might add this to the constraint in (32), but that would miss a generalisation. Head-complement phrases are not the only phrases which have the same value for SUBJ as their head. This is also a feature of head-filler phrases, as we will see below. It seems, in fact, that it is normal for a phrase to have the same value for any valence feature as its head. This is often attributed to the Valence Principle, which can be stated informally as follows (cf. Sag & Wasow 1999: 86):

(33) Unless some constraint says otherwise, the mother's values for the valence features are identical to those of the head daughter.

There is no assumption in HPSG that all branching is binary.<sup>30</sup> Hence, where a head takes two complements, both may be its sisters. An example of the sort of structures that the analysis licenses is illustrated in Figure 6.

Figure 6: A tree for a head-complement phrase

Instead of the Head Feature Principle and the Valence Principle, Ginzburg & Sag (2000: 33) propose the Generalised Head Feature Principle, which takes the following form:

<sup>30</sup>However, binary branching has been assumed in HPSG grammars for a number of languages. See Müller (2021b: Section 3), Chapter 10 of this volume.

Anne Abeillé & Robert D. Borsley

$$(34)\quad headed \text{-}ph \Rightarrow \begin{bmatrix} \text{SYNSEM} \land \begin{bmatrix} \text{II} \\ \text{HD-DTR} \end{bmatrix} \end{bmatrix}$$

The slashes (/) here indicate that this is a default constraint (Lascarides & Copestake 1999). Thus, it says that a headed phrase and its head daughter have the same SYNSEM value unless some other constraint requires something different. In versions of HPSG which assume this constraint, it is responsible for the fact that a head-complement phrase has the same value for SUBJ as the head daughter, among many other things.

We turn now to the type *hd-subj-ph*. Here we need a constraint which mentions the SYNSEM value of the phrase – more precisely, its SUBJ value – and not just the daughters, as follows:

$$\begin{array}{rcl} \text{(35)} & hd\text{-subj-}ph \Rightarrow \begin{bmatrix} \text{sUBJ} & \text{\^\cdot} \\\\ \text{HD-DTR} \left\square & \begin{bmatrix} \text{sUBJ} & \left\{\square\right\} \\\\ \text{COMPS} \left\langle \right\rangle \end{bmatrix} \end{array} \end{array}$$

This ensures that a head-subject phrase is [SUBJ hi] and has a head daughter which is [COMPS hi] and a non-head daughter with the *synsem* properties that appear in the head's SUBJ list.<sup>31</sup> It licenses structures like that in Figure 7.

Finally, we consider the type *hd-filler-ph*. This involves the feature SLASH, one of the features contained in the value of the feature NONLOCAL introduced earlier in (9). Its value is a set of *local* objects, and it encodes information about unbounded dependency gaps (see Borsley & Crysmann 2021, Chapter 13 of this volume). Here is the relevant constraint:<sup>32</sup>


<sup>31</sup>Instead of requiring the head to be [COMPS hi], one might require it to be a phrase (which would be required by (11) to be [COMPS hi]). However, this would require e.g. *laughed* in *Kim laughed* to be analysed as a phrase consisting of a single word. With (35), it can be analysed as just a word.

<sup>32</sup>We use ∪ for set union. Notice that the mother category does not have to have an empty SLASH list, thus allowing for multiple extractions (*Paul, who could we talk to about?* where *Paul* is understood as object of *about* and *who* as object of *to*).

#### 1 Basic properties and elements

They give some money to charity

Figure 7: A tree for a head-subject phrase

This says that a head-filler phrase has a head daughter with a SLASH set which is the SLASH set of the head-filler phrase plus one other *local* object, and a non-head daughter, whose LOCAL value is the additional *local* object of the head daughter. <sup>1</sup> is normally the empty set.<sup>33</sup> Figure 8 illustrates a typical head-filler phrase.

Notice that the head daughter in a head-filler phrase is not required to have an empty SUBJ list (it is not marked as [SUBJ hi]) and hence does not have to be a head-subject phrase. It can also be a head-complement phrase (a VP), as in the following:

(37) I'm wondering [who [to talk to]].

Either the Valence Principle or the Generalised Head Feature Principle will ensure that a head-filler phrase has the same value for SUBJ as its head daughter.

The constraints that we have just discussed are rather like phrase structure rules. This led Ginzburg & Sag (2000: 33) to use an informal notation which reflects this. This involves the phrase type on the first line followed by a colon, and information about the phrase itself and its daughters on the second line sep-

<sup>33</sup>As with (35), one might substitute *phrase* here for [COMPS hi]. But this would mean that *to* in *I would do it but I don't know how to* must be analysed as a phrase containing a single word. With (36), it can be just a word.

Figure 8: A tree for a head-filler phrase

arated by an arrow and with the head daughter identified by "**H**". Thus, instead of (38a), one has (38b).

$$\begin{aligned} \text{(38)} \quad \text{a. } \mathit{phras} & \Longrightarrow \begin{bmatrix} \text{sYSEM X} \\ \text{DTRS} & \langle \boxed{\Box} \text{Y}, Z \rangle \\ \text{HD-DTR} & \Box \end{bmatrix} \\ \text{b. } \mathit{phras} & \begin{aligned} \text{b. } \mathit{phras} & \end{aligned} \\ \text{X} & \rightarrow \begin{aligned} \text{H}[\text{Y}], \text{Z} \end{aligned} \end{aligned}$$

Notice that while the double arrow in (38a) has the normal "if-then" interpretation, the single arrow in (38b) means "consists of". In some circumstances, this informal notation may be more convenient than the more formal notation used in (38a).

In the preceding discussion, we have ignored the semantics of the phrase. Leaving aside quantification and other complex matters, and assuming INDEX and REL(ATION)S as in MRS (as shown in (23) above), the CONTENT of a headed phrase can be handled via two semantic principles: a coindexing principle (the INDEX of a headed phrase is the INDEX of its HEAD-DTR) and a "compositionality" principle (the RELS of a phrase is the concatenation of the RELS of its DTRS; Copestake et al. 1 Basic properties and elements

2005: Section 4.3.2, Section 5; Koenig & Richter 2021: Section 6.1, Chapter 22 of this volume).

The type hierarchy in Figure 5 is simplified in a number of respects. It includes no non-headed phrases.<sup>34</sup> It also ignores various other subtypes of *headed-phrase*, some of which are discussed in the next section. Most importantly, it is widely assumed that the type *phrase*, like the type *lexeme*, can be cross-classified along two dimensions, one dealing with head-dependent relations and the other dealing with the properties of various types of clauses. A simplified illustration is given in Figure 9.

Figure 9: Cross-classification of phrases

Here *wh-interr-cl* is identified as a subtype of *head-filler-phrase* and a subtype of *interr*(*ogative*)*-cl*. As such, it has both the properties required by the constraint in (36) and certain properties characteristic of interrogative clauses, most obviously interrogative semantics.

# **5.2 Constituency and constituent order**

We must now say something about constituent order. In much HPSG work, this is a matter of phonology: more precisely, a matter of the relation between the PHON value of a phrase and the PHON values of its daughters.<sup>35</sup> Consider, for example, a phrase with two daughters, each with its own PHON value. The PHON value of the phrase will be the concatenation of the PHON values of the daughters.

<sup>34</sup>The most important type of non-headed phrase is coordinate structure. See Abeillé & Chaves (2021), Chapter 16 of this volume for discussion.

<sup>35</sup>As discussed in Section 7.1, in some HPSG work, linear order is a property of so-called order domains, which essentially mediate constituent structure and phonology (see Müller 2021b: Section 6, Chapter 10 of this volume).

Anne Abeillé & Robert D. Borsley

Clearly, they can be concatenated in two ways as in (39), or their order may be left unspecified for "free" word order:<sup>36</sup>

$$\begin{array}{ll} \text{(39)} & \text{a. } \begin{bmatrix} \text{PHON } [] \oplus [] \\ \text{DTRS } \left\langle \left[ \begin{bmatrix} \text{PHON } [] \end{bmatrix}, \begin{bmatrix} \text{PHON } [] \end{bmatrix} \right\rangle \right\rangle \\\ \text{b. } & \begin{bmatrix} \text{PHON } [] \oplus [] \\ \text{DTRS } \left\langle \left[ \begin{bmatrix} \text{PHON } [] \end{bmatrix}, \begin{bmatrix} \text{PHON } [] \end{bmatrix} \right\rangle \right\rangle \end{array} \end{array}$$

Within this approach, the following English and Welsh examples might have exactly the same analysis (a head-adjunct phrase) except for their PHON values:

(40) a. black sheep

b. defaid sheep.PL du black 'black sheep'

Similarly, a prepositional phrase in English and a postpositional phrase in Japanese might have the same analysis (a head-complement phrase) apart from their PHON values. Ordering rules are constraints on phrasal types. They are commonly written with < ("precedes"). Thus, languages with head-complement order might have the rule in (41a), and languages with complement-head order the rule in (41b).

(41) a. - COMPS …, <sup>1</sup>, …< - SYNSEM 1 b. - SYNSEM 1 < - COMPS …, <sup>1</sup>, …

But it should be remembered that ordering rules are well-formedness constraints on structures built with certain concatenations of PHON values as in (39).<sup>37</sup>

Not all pairs of expressions which might be seen as differing just in word order have the same analysis apart from their PHON values. Consider, for example, the following:

(42) a. Kim is late.

b. Is Kim late?

(i) a. HD-DTR < COMPS-DTRS

b. COMPS-DTRS < HD-DTR

<sup>36</sup>Unspecified means any combination of <sup>1</sup> and <sup>2</sup> using the shuffle operation: <sup>1</sup> <sup>2</sup> . (see footnote 21)

<sup>37</sup>An alternative notation, provided different daughters are distinguished with different names, could be:

1 Basic properties and elements

Here, we have a declarative and a related interrogative. They differ semantically and in word order, but for most work in HPSG, they also differ in their syntactic structures. (42a) is a head-subject phrase much like that in Figure 7. Clauses like (42b), on the other hand, are standardly seen as ternary branching phrases in which both the subject and the complement are a sister of the auxiliary (Pollard & Sag 1994: 40). This requires an additional phrase type, which might be called *head-subject-complement-phrase*. 38

# **6 Further syntactic structures**

Head-complement phrases, head-subject phrases, and head-filler phrases are perhaps the most important types of syntactic structures, but there are others that are of considerable importance. Here we will say something about three of them: head-adjunct phrases, head-specifier phrases, and head-marker phrases.

# **6.1 Adjuncts**

Adverbs, adverbial PPs within VPs, attributive adjectives, and relative clauses within NPs are commonly viewed as adjuncts. Thus, the following illustrate head-adjunct phrases (with the head following the adjunct in (43a) and (43c) and preceding in (43b) and (43d)):

	- b. Kim [[met Lee] in the pub]
	- c. a [new [book about syntax]]
	- d. a [[book about syntax] which impresses everyone]

In much HPSG work, adjuncts select the heads they combine with through a feature MOD(IFIES) whose value is a *synsem* object, while other signs are [MOD *none*]. Thus, (43a) involves the schematic structure in Figure 10.

In the case of adverbs, adverbial PPs, and attributive adjectives, it is a simple matter to assign an appropriate value to MOD, and this value can be underspecified to account for the polymorphism of certain adverbs which can modify all (major) categories (Abeillé & Godard 2003: 28–29). In the case of relative clauses, it is more complex because the value of MOD must be coindexed with the *wh*element, if there is one, or the gap, if there isn't. In (43d), this is reflected in the

<sup>38</sup>In Ginzburg & Sag (2000: 36), it is called *sai-phrase*. In some HPSG work, e.g. Sag et al. (2003: 409–414), examples like (42b) are analysed as involving an auxiliary verb with two complements and no subject. This approach has no need for an additional phrase type, but it requires an alternative valence description for auxiliary verbs.

Figure 10: A tree for a head-adjunct phrase

fact that the verb in the relative clause is the singular *impresses* and not the plural *impress*. See Borsley & Crysmann (2021), Chapter 13 of this volume and Arnold & Godard (2021), Chapter 14 of this volume for some discussion.

Notice also that in head-adjunct phrases, the adjunct is not a syntactic head, but may well be the semantic head. This is an example of the difference between syntactic head and semantic head, and between syntactic argument and semantic argument in HPSG.

Although an adjunct analysis of adverbial PPs seems quite natural, it has been argued in some HPSG work that they are in fact optional complements of verbs (see e.g. Abeillé & Godard 1997; Bouma et al. 2001: 4; Ginzburg & Sag 2000: 168, Footnote 2). On this view, *in the pub* in (43b) is much like the same phrase in (44), where it is clearly a (predicative) complement:

(44) Kim is in the pub.

Various arguments have been advanced for this position, but it is controversial and it is rejected by Levine (2003), Levine & Hukari (2006: Chapter 3), and Chaves (2009). There is an unresolved issue here.<sup>39</sup>

<sup>39</sup>It has been argued that some adverbs and PPs are adjuncts and others are complements, depending on word order, case, and so on. (see, for example, Przepiórkowski 1999, Hassamal & Abeillé 2014, and Kim 2021: Section 2.3, Chapter 18 of this volume).

1 Basic properties and elements

# **6.2 Specifiers and markers**

As noted earlier, some HPSG work assumes a feature SPR (SPECIFIER) which is realised by various categories. In some work, subjects are analysed as specifiers (Sag, Wasow & Bender 2003: 100–103), but in other approaches, they are realisations of a SUBJ(ECT) feature, as discussed in the last section. For some HPSG work, e.g. Pollard & Sag (1994: Section 9.4) and Sag et al. (2003: Section 4.3), determiners within NPs are an important example of specifiers. On this view, *the pub* has the schematic structure in Figure 11.

Figure 11: A tree for a head-specifier phrase

Some recent work, e.g. Sag (2012: 84), has adopted a rather different view of at least some determiners, namely that they are what are known as markers, a notion first introduced in Pollard & Sag (1994: Section 1.6). These are nonheads which select the head that they combine with through a SELECT feature (Van Eynde 1998; Van Eynde 2021: Section 2.3, Chapter 8 of this volume) but determine the MARKING value of their mother. Within this approach, *the pub* has the schematic structure in Figure 12. 40

A marker analysis was originally proposed for complementisers. However, they have also been analysed as heads within HPSG, e.g. in Sag (1997: 456–458)

<sup>40</sup>Work which assumes the SELECT feature also uses it instead of MOD for adjuncts and considers both markers and adjuncts to be "functors" (Van Eynde 1998; Van Eynde 2021: Section 2.3.2, Chapter 8 of this volume).

Figure 12: A tree for a head-functor phrase

and Ginzburg & Sag (2000: Section 2.8). There is no consensus here.

# **7 Further topics**

There are many other aspects of HPSG that could be discussed in this chapter, but we will focus on just two: what are known as order domains, and the distinguishing properties of the SBCG version of HPSG.

# **7.1 Order domains**

We noted above that much HPSG work views word order as a matter of phonology, specifically a matter of the relation between the PHON value of a phrase and the PHON values of its daughters (see Müller 2021b, Chapter 10 of this volume). Some work in HPSG argues that this is too simple in that it ties the observed order too closely to constituent structure. Consider the following examples:

	- b. A man came into the room who looked like Churchill.

One might assume that these show different observed orders because they have different structures (Kiss 2005), but one might also want to claim that they have the same constituent structure (Kathol & Pollard 1995). This is possible if the

#### 1 Basic properties and elements

observed order is not a simple reflection of constituent structure. Much work in HPSG has proposed that the observed order is a reflection not of the constituent structure of an expression but of a separate system of order domains (see Reape 1994; Müller 1996; Kathol 2000). Within this approach, ordering rules may order non-sister elements, as long as they belong to the same order domain: the constituent structure of an expression can be encoded as the value of a DTRS (DAUGHTERS) feature and the order domain as the value of a DOM(AIN) feature. Adopting this position, one might propose that (45b) has the schematic analysis in (46).

$$\begin{array}{l} \text{(46)} \quad \begin{cases} \text{SYNSEM S} \\ \text{DTRS} \left\{ \left[ a \text{ man w} \right.0a \text{ looked like } \text{Church} \text{H} \right], \left[ \text{cam} \text{ into the room} \right] \right\} \\ \text{DOM} \left\{ \left[ a \text{ man} \right], \left[ \text{cam} \text{ into the room} \right], \left[ \text{who} \text{ loaded like } \text{Church} \text{H} \right] \right\} \end{array}$$

Here the clause has two daughters but three domain elements. The simpler example in (45a) will have two daughters and two domain elements.

It is worth noting that this approach allows a different analysis for interrogatives like (42b). It would be possible to propose an analysis in which they have two daughters and three domain elements as follows:

$$\begin{array}{c} \text{(}\text{s}\text{)}\text{ }\left\{ \begin{subarray}{l} \text{s}\text{Types} \text{} \text{ S} \\ \text{DTRS} \left\langle \left[ Kim \right], \left[ \text{is } late \right] \right\rangle \\ \text{DOM} \left\langle \left[ is \right], \left[ Kim \right], \left[ late \right] \right\rangle \end{array} \right\} \end{array}$$

As far as we are aware, no one has proposed such an analysis for English interrogatives, but essentially this analysis is proposed for German interrogatives in Kathol (2000: 81).<sup>41</sup>

Order domains seem most plausible as an approach to the sorts of discontinuity that are found in so-called nonconfigurational languages such as Warlpiri (Donohue & Sag 1999). However, they may well have a role to play in more familiar languages (Bonami et al. 1999; Chaves 2014). But exactly how much of a role they should play in syntax is an unresolved matter.

One might wonder whether a version of HPSG that includes order domains is still a monostratal framework. It remains a framework in which linguistic expressions have a single constituent structure. However, it does have a second important level of representation, which makes available a variety of analyses

<sup>41</sup>Kathol (2000) assumes that order domains are divided into topological fields and shows how this idea allows an interesting approach to various aspects of clausal word order. See Borsley (2006) for an application of this idea to negation.

Anne Abeillé & Robert D. Borsley

which would otherwise not be possible. Whether the framework is still monostratal depends on how exactly the term is used. We will not take a stand on this.

# **7.2 Sign-Based Construction Grammar**

The SBCG version of HPSG will be discussed in some detail in the next chapter (Flickinger, Pollard & Wasow 2021: 68–70), in the chapter on unbounded dependencies (Borsley & Crysmann 2021: Section 10), and in the chapter on HPSG and Construction Grammar (Müller 2021c: Section 1.3.2). Here we will just highlight the central difference between this approach and earlier work. The term "construction" is widely used in connection with the earlier Constructional HPSG, but within that work, constructions are just types of sign. In contrast, for SBCG, signs and constructions are quite different objects.

For SBCG, constructions are objects which associate a MTR (MOTHER) sign with a list of DAUGHTER signs, one of which is a HEAD-DAUGHTER in a headed construction. Thus, constructions take the form in (48a) and headed-constructions the form in (48b):

$$\begin{array}{ll} \text{(48)} & \text{a.} \begin{bmatrix} \text{cx} \\ \text{MTR} & \text{sign} \\ \text{DTRs} & \text{list(sign)} \end{bmatrix} \\ & \text{b.} \begin{bmatrix} \text{headed-cx} \\ \text{MTR} & \text{sign} \\ \text{DTRS} & \text{list(sign)} \\ \text{HD-DTR} & \text{sign} \end{bmatrix} \end{array}$$

Constructions are utilised by the Sign Principle, which can be formulated as follows:<sup>42</sup>

(49) Signs are well formed if either (a) they match some lexical entry, or (b) they match the mother of some construction.

Constructions and the Sign Principle are properties of SBCG which are lacking in earlier work. Essentially, then, they are complications. But they allow simplifications. In particular, they mean that signs do not need to have the features DTRS and HD-DTR. This in turn allows the framework to dispense with the feature SYNSEM and the type *synsem*. These elements are necessary in earlier HPSG

<sup>42</sup>Lexical rules are analysed in SBCG as lexical constructions. Thus, (b) covers derived words as well as phrases.

1 Basic properties and elements

because taking the value of COMPS to be a list of signs would incorrectly predict that heads may select complements not just with specific syntactic and semantic properties, but also with specific kinds of internal structure. For example, it would allow a verb to select as its complement a phrase whose head has a specific type of complement. To exclude this possibility, earlier versions of HPSG seem to need SYNSEM and *synsem* (Pollard & Sag 1994: 23). In SBCG, it is excluded by the assumption that signs do not have the features DTRS and HD-DTR, and so SYNSEM and *synsem* are unnecessary. Thus, SBCG is both more complex and simpler than earlier versions of the framework. This means that considerations of simplicity do not obviously favour or disfavour the approach.

# **8 Concluding remarks**

In the preceding pages, we have spelled out the basic properties of HPSG and the assumptions it makes about the nature of linguistic analyses and the conduct of linguistic research. We have looked at the types, features, and constraints that are the building blocks of HPSG analyses. We have also outlined the HPSG approach to the lexicon and the basics of its approach to syntax, and we have considered some of the main types of syntactic structure. Finally, we have discussed order domains and SBCG. More can be learned about all of these matters in the chapters that follow.

# **Acknowledgements**

We are grateful to Stefan Müller, Jean-Pierre Koenig, and Frank Richter for many helpful comments on earlier versions of this chapter. We alone are responsible for what appears here.

# **References**


Anne Abeillé & Robert D. Borsley


1 Basic properties and elements

Koenig (eds.), *Head-Driven Phrase Structure Grammar: The handbook* (Empirically Oriented Theoretical Morphology and Syntax), 1105–1153. Berlin: Language Science Press. DOI: 10.5281/zenodo.5599868.


Anne Abeillé & Robert D. Borsley


1 Basic properties and elements


Anne Abeillé & Robert D. Borsley


1 Basic properties and elements


Anne Abeillé & Robert D. Borsley


1 Basic properties and elements

*tational Linguistics: 4th International Conference* (Lecture Notes in Computer Science 2099), 17–43. Berlin: Springer Verlag. DOI: 10.1007/3-540-48199-0\_2.


Anne Abeillé & Robert D. Borsley


1 Basic properties and elements

Wechsler, Stephen & Ash Asudeh. 2021. HPSG and Lexical Functional Grammar. In Stefan Müller, Anne Abeillé, Robert D. Borsley & Jean- Pierre Koenig (eds.), *Head-Driven Phrase Structure Grammar: The handbook* (Empirically Oriented Theoretical Morphology and Syntax), 1395–1446. Berlin: Language Science Press. DOI: 10.5281/zenodo.5599878.

# **Chapter 2**

# **The evolution of HPSG**

Dan Flickinger Stanford University

Carl Pollard Ohio State Universitiy

Thomas Wasow

Stanford University

HPSG was developed to express insights from theoretical linguistics in a precise formalism that was computationally tractable. It drew ideas from a wide variety of traditions in linguistics, logic, and computer science. Its chief architects were Carl Pollard and Ivan Sag, and its most direct precursors were Generalized Phrase Structure Grammar and Head Grammar. The theory has been applied in the construction of computational systems for the analysis of a variety of languages; a few of these systems have been used in practical applications. This chapter sketches the history of the development and application of the theory.

# **1 Introduction**

From its inception in 1983, HPSG was intended to serve as a framework for the formulation and implementation of natural language grammars which are (i) linguistically motivated, (ii) formally explicit, and (iii) computationally tractable. These desiderata are reflective of HPSG's dual origins as an academic linguistic theory and as part of an industrial grammar implementation project with an eye toward potential practical applications. Here (i) means that the grammars are intended as scientific theories about the languages in question, and that the analyses the grammars give rise to are transparently relatable to the predictions

Dan Flickinger, Carl Pollard & Thomas Wasow. 2021. The evolution of HPSG. in Stefan Müller, Anne Abeillé, Robert D. Borsley & Jean- Pierre Koenig (eds.), *Head-Driven Phrase Structure Grammar: The handbook*, 47–87. Berlin: Language Science Press. DOI: 10.5281/zenodo.5599820

#### Dan Flickinger, Carl Pollard & Thomas Wasow

(empirical consequences) of those theories. Thus HPSG shares the general concerns of the theoretical linguistics literature, including distinguishing between well-formed and ill-formed expressions and capturing linguistically significant generalizations. (ii) means that the notation for the grammars and its interpretation have a precise grounding in logic, mathematics, and theoretical computer science, so that there is never any ambiguity about the intended meaning of a rule or principle of grammar, and so that grammars have determinate empirical consequences. (iii) means that the grammars can be translated into computer programs that can handle linguistic expressions embodying the full range of complex interacting phenomena that naturally occur in the target languages, and can do so with a tolerable cost in space and time resources.

The two principal architects of HPSG were Carl Pollard and Ivan Sag, but a great many other people made important contributions to its development. Many, but by no means all, are cited in the chronology presented in the following sections. There are today a number of groups of HPSG researchers around the world, in many cases involved in building HPSG-based computational systems. While the number of practitioners is relatively small, it is a very active community that holds annual meetings and publishes quite extensively.<sup>1</sup> Hence, although Pollard no longer works on HPSG and Sag died in 2013, the theory is very much alive, and still evolving.

# **2 Precursors**

HPSG arose between 1983 and 1985 from the complex interaction between two lines of research in theoretical linguistics: (i) work on context-free Generative Grammar (CFG) initiated in the late 1970s by Gerald Gazdar and Geoffrey Pullum, soon joined by Ivan Sag, Ewan Klein, Tom Wasow, and others, resulting in the framework referred to as Generalized Phrase Structure Grammar (GPSG: Gazdar, Klein, Pullum & Sag 1985); and (ii) Carl Pollard's Stanford dissertation research, under Sag and Wasow's supervision, on Generalized Context-Free Grammar, and more specifically Head Grammar (HG: Pollard 1984).

# **2.1 Generalized Phrase Structure Grammar**

In the earliest versions of Generative Grammar (Chomsky 1957), the focus was on motivating transformations to express generalizations about classes of sentences. In the 1960s, as generative linguists began to attend more explicitly to

<sup>1</sup>See https://hpsg.hu-berlin.de/HPSG-Bib/ for a list of HPSG publications.

#### 2 The evolution of HPSG

meaning, a division arose between those advocating using the machinery of transformations to capture semantic generalizations and those advocating the use of other types of formal devices. This division became quite heated, and was subsequently dubbed "the linguistic wars" (see Newmeyer 1980: Chapter 5; Harris 1993). Much of the work in theoretical syntax and semantics during the 1970s explored ways to constrain the power of transformations (see especially Chomsky 1973 and Chomsky & Lasnik 1977), and non-transformational approaches to the analysis of meaning (see especially Montague 1974 and Dowty 1979).

These developments led a few linguists to begin questioning the central role transformations had played in syntactic research of the preceding two decades (notably, Bresnan 1978). This questioning of Transformational Grammar (TG) culminated in a series of papers by Gerald Gazdar, which (in those pre-internet days) were widely distributed as paper manuscripts. The project that they laid out was succinctly summarized in one of Gazdar's later publications as follows:

Consider eliminating the transformational component of a generative grammar. (Gazdar 1981: 155)

The framework that emerged became known as Generalized Phrase Structure Grammar; a good account of its development is Ted Briscoe's interview of Gazdar in November 2000.<sup>2</sup>

GPSG developed in response to several criticisms leveled against transformational grammar. First, TG was highly underformalized, to the extent that it was unclear what its claims—and the empirical consequences of those claims amounted to; CFG, by comparison, was a simple and explicit mathematical formalism. Second, given the TG architecture of a context-free base together with a set of transformations, the claimed necessity of transformations was standardly justified on the basis of arguments that CFGs were insufficiently expressive to serve as a general foundation for natural language (NL) grammar; but Pullum & Gazdar (1982) showed all such arguments presented up to that time to be logically flawed or else based on false empirical claims. And third, closely related to the previous point, they showed that transformational grammarians had been insufficiently resourceful in exploiting what expressive power CFGs *did* possess, especially through the use of complex categories bearing features whose values might themselves bear features of their own. For example, coordinate constructions and unbounded dependency constructions had long served as prime exemplars of the need for transformations, but Gazdar (1981) was able to show that

<sup>2</sup>https://nlp.fi.muni.cz/~xjakub/briscoe-gazdar/, 2021-01-15.

#### Dan Flickinger, Carl Pollard & Thomas Wasow

both kinds of constructions, as well as interactions between them, did in fact yield straightforward analyses within the framework of a CFG.

Gazdar and Pullum's early work in this vein was quickly embraced by Sag and Wasow at Stanford University, both formally inclined former students of Chomsky's, who saw it as the logical conclusion of a trend in Chomskyan syntax toward constraining the transformational component. That trend, in turn, was a response, at least in part, to (i) the demonstration by Peters & Ritchie (1973) that Chomsky's (1965) Standard Theory, when precisely formalized, was totally unconstrained, in the sense of generating all recursively enumerable languages; and (ii) the insight of Emonds (1976) that most of the transformations proposed up to that time were "structure-preserving" in the sense that the trees they produced were isomorphic to ones that were base-generated. Besides directly addressing these issues of excess power and structure preservation, the hypothesis that NLs were context-free also had the advantage that CFGs were well-known by computer scientists to have decidable recognition problems and efficient parsing algorithms, facts which seemed to have some promise of bearing on questions of the psychological plausibility and computational tractability of the grammars in question.

Aside from serving as a framework for theoretical linguistic research, GPSG also provided the theoretical underpinnings for a natural language processing (NLP) project established in 1981 by Egon Loebner at Hewlett-Packard Laboratories in Palo Alto. This project, which led in due course to the first computer implementation of HPSG, is described below.

# **2.2 Head Grammar**

Pollard, with a background in pure mathematics, Chinese historical phonology, and 1930s–1950s-style American structural linguistics, arrived at Stanford in 1979 with the intention of getting a PhD in Chinese linguistics, but was soon won over to theoretical syntax by Wasow and Sag. He had no exposure to Chomskyan linguistics, but was immediately attracted to the emerging nontransformational approaches, especially the early GPSG papers and the contemporaneous forms of CG in Bach (1979; 1980) and Dowty (1982a; 1982b), in part because of their formal simplicity and rigor, but also because the formalism of CFG was (and is) easy to read as a more technically precise rendering of structuralist ideas about syntax (as presented, e.g., in Bloomfield 1933 and Hockett 1958).

Although Pullum & Gazdar (1982)successfully refuted all published arguments to date that CFGs were inadequate for analyzing NLs, by the following year, Stuart Shieber had developed an argument (published in Shieber 1985), which was

#### 2 The evolution of HPSG

(and remains) generally accepted as correct, that there could not be a CFG that accounted for the cross-serial dependencies in Swiss German; and Chris Culy showed, in his Stanford M.A. thesis (cf. Culy 1985), that the presence of reduplicative compounding in Bambara precluded a CF analysis of that language. At the same time, Bach and Dowty (independently) had been experimenting with generalizations of traditional A-B (Ajdukiewicz-Bar Hillel) CG which allowed for modes of combining strings (such as reduplication, wrapping, insertion, cliticization, and the like) in addition to the usual concatenation. This latter development was closely related to a wider interest among nontransformational linguists of the time in the notion of discontinuous constituency, and also had an obvious affinity to Hockett's (1954) item-and-process conception of linguistic structure, albeit at the level of words and phrases rather than morphemes. One of the principal aims of Pollard's dissertation work was to provide a general framework for syntactic (and semantic) analysis that went beyond—but not too far beyond—the limits of CFG in a way that took such developments into account.

Among the generalizations of CFG that Pollard studied, special attention was given to HGs, which differ from CFGs in two respects: (i) the role of strings was taken over by headed strings, essentially strings with a designation of one of its words as its head; and (ii) besides concatenation, headed strings can also be combined by inserting one string directly to the left or right of another string's head. An appendix of his dissertation (Pollard 1984: Appendix 1) provided an analysis of discontinuous constituency in Dutch, and that analysis also works for Swiss German. In another appendix, Pollard used a generalization of the CKY algorithm to prove that the head languages (HLs, the languages analyzed by HGs) shared with CFLs the property of deterministic polynomial time recognition complexity, but of order 7 , subsequently reduced by Kasami, Seki & Fujii (1989) to 6 , as compared with order 3 for CFLs. For additional formal properties of HGs, see Roach (1987). Vijay-Shanker & Weir (1994) proved that HGs had the same weak generative capacities as three other grammar formalisms – Combinatory Categorial Grammar (Steedman 1987; Steedman 1990), Lexicalized Tree-Adjoining Grammar (Shabes 1990), and Linear Indexed Grammar (Gazdar 1988) – and the corresponding class of languages became known as *mildly context sensitive*.

Although the handling of linearization in HG seems not to have been pursued further within the HPSG framework, the ideas that (i) linearization had to involve data structures richer than strings of phoneme strings, and (ii) the way these structures were linearized had to involve operations other than mere concatenation, were implicit in subsequent HPSG work, starting with Pollard & Sag's (1987: 169) Constituent Order Principle (which was really more of a promissory note than an actual principle). These and related ideas would become more

#### Dan Flickinger, Carl Pollard & Thomas Wasow

fully fleshed out a decade later within the linearization grammar avatar of HPSG developed by Reape (1996), Reape (1992), Kathol (1995; 2000), and Müller (1995; 1996; 1999; 2004). (See also Müller (2021b: Section 6), Chapter 10 of this volume on linearization approaches in HPSG.) On the other hand, two other innovations of HG, both related to the system of syntactic features, were incorporated into HPSG, and indeed should probably be considered the defining characteristics of that framework, namely the list-valued SUBCAT and SLASH features, discussed below.

# **3 The HP NL project**

Work on GPSG culminated in the 1985 book *Generalized Phrase Structure Grammar* by Gazdar, Klein, Pullum, and Sag. During the writing of that book, Sag taught a course on the theory, with participation of his co-authors. The course was attended not only by Stanford students and faculty, but also by linguists from throughout the area around Stanford, including the Berkeley and Santa Cruz campuses of the University of California, as well as people from nearby industrial labs. One of the attendees at this course was Anne Paulson, a programmer from Hewlett-Packard (HP) Laboratories in nearby Palo Alto, who had some background in linguistics from her undergraduate education at Brown University. Paulson told her supervisor at HP Labs, Egon Loebner, that she thought the theory could be implemented and might be turned into something useful. Loebner, a multi-lingual polymathic engineer, had no background in linguistics, but he was intrigued, and invited Sag to meet and discuss setting up a natural language processing project at HP. Sag brought along Gazdar, Pullum, and Wasow. This led to the creation of the project that eventually gave rise to HPSG. Gazdar, who would be returning to England relatively soon, declined the invitation to be part of the new project, but Pullum, who had taken a position at the University of California at Santa Cruz (about an hour's drive from Palo Alto), accepted. So the project began with Sag, Pullum, and Wasow hired on a part-time basis to work with Paulson and two other HP programmers, John Lamping and Jonathan King, to implement a GPSG of English at HP Labs. J. Mark Gawron, a linguistics graduate student from Berkeley who had attended Sag's course, was very soon added to the team.

The initial stages consisted of the linguists and programmers coming up with a notation that would serve the purposes of both. Once this was accomplished, the linguists set to work writing a grammar of English in Lisp to run on the DEC-20 mainframe computer that they all worked on. The first publication coming out

#### 2 The evolution of HPSG

of this project was a 1982 Association for Computational Linguistics paper. The paper's conclusion begins:

What we have outlined is a natural language system that is a direct implementation of a linguistic theory. We have argued that in this case the linguistic theory has the special appeal of computational tractability (promoted by its context-freeness), and that the system as a whole offers the hope of a happy marriage of linguistic theory, mathematical logic, and advanced computer applications. (Gawron et al. 1982: 80)

This goal was carried over into HPSG.

It should be mentioned that the HP group was by no means alone in these concerns. The early 1980s was a period of rapid growth in computational linguistics (due at least in part to the rapid growth in the power and accessibility of computers). In the immediate vicinity of Stanford and HP Labs, there were at least two other groups working on developing natural language systems that were both computationally tractable and linguistically motivated. One such group was at the Xerox Palo Alto Research Center, where Ron Kaplan and Joan Bresnan (in collaboration with a number of other researchers, notably Martin Kay) were developing Lexical Functional Grammar;<sup>3</sup> the other was at SRI International, where a large subset of SRI's artificial intelligence researchers (including Barbara Grosz, Jerry Hobbs, Bob Moore, Hans Uszkoreit, Fernando Pereira, and Stuart Shieber) worked on natural language. Thanks to the founding of the Center for the Study of Language and Information (CSLI) at Stanford in the early 1980s, there was a great deal of interaction among these three research groups. Although some aspects of the work being done at the three non-Stanford sites were proprietary, most of the research was basic enough that there was a fairly free flow of ideas among the three groups about building linguistically motivated natural language systems.

Other projects seeking to develop theories combining computational tractability with linguistic motivation were also underway outside of the immediate vicinity of Stanford, notably at the Universities of Pennsylvania and Edinburgh. Aravind Joshi and his students were working on Tree Adjoining Grammars (Joshi, Levy & Takahashi 1975; Joshi 1987), while Mark Steedman and others were developing Combinatory Categorial Grammar (Steedman 1987; Steedman 1990).

During the first few years of the HP NL project, several Stanford students were hired as part-time help. One was Pollard, who was writing his doctoral

<sup>3</sup>For a comparison of HPSG and LFG see (Wechsler & Asudeh 2021, Chapter 30 of this volume). A handbook of LFG parallel to this handbook is in preparation (Dalrymple 2021).

#### Dan Flickinger, Carl Pollard & Thomas Wasow

dissertation under Sag's supervision. Ideas from his thesis work played a major role in the transition from GPSG to HPSG. Two other students who became very important to the project were Dan Flickinger, a doctoral student in linguistics, and Derek Proudian, who was working on an individually-designed undergraduate major when he first began at HP and later became a master's student in computer science. Both Flickinger and Proudian became full-time HP employees after finishing their degrees. Over the years, a number of other HP employees also worked on the project and made substantial contributions. They included Susan Brennan, Lewis Creary, Marilyn Friedman (now Walker), Dave Goddeau, Brett Kessler, Joachim Laubsch, and John Nerbonne. Brennan, Walker, Kessler, and Nerbonne all later went on to academic careers at major universities, doing research dealing with natural language processing.

The HP NL project lasted until the early 1990s. By then, a fairly large and robust grammar of English had been implemented. The period around 1990 combined an economic recession with what has sometimes been termed an "AI winter" – that is, a period in which enthusiasm and hence funding for artificial intelligence research was at a particularly low ebb. Since NLP was considered a branch of AI, support for it waned. Hence, it was not surprising that the leadership of HP Labs decided to terminate the project. Flickinger and Proudian came to an agreement with HP that allowed them to use the NLP technology developed by the project to launch a new start-up company, which they named Eloquent Software. They were, however, unable to secure the capital necessary to turn the existing system into a product, so the company never got off the ground.

# **4 The emergence of HPSG**

A few important features of GPSG that were later carried over into HPSG are worth mentioning here. First, GPSG borrowed from Montague the idea that each phrase structure rule was to be paired with a semantic rule providing a recipe for computing the meaning of the mother from the meanings of its daughters (Gazdar 1981: 156); this design feature was shared with contemporaneous forms of Categorial Grammar (CG) being studied by such linguists as Emmon Bach (Bach 1979; 1980) and David Dowty (Dowty 1982a; Dowty 1982b). Second, the specific inventory of features employed in GPSG for making fine-grained categorial distinctions (such as case, agreement, verb inflectional form, and the like), was largely preserved, though the technical implementation of morphosyntactic features in HPSG was somewhat different. And third, the SLASH feature, which originated in Gazdar's (1981) derived categories (e.g. S/NP), and which was used

#### 2 The evolution of HPSG

to keep track of unbounded dependencies, was generalized in HPSG to allow for multiple unbounded dependencies (as in the notorious violins-and-sonatas example in (1) below). As will be discussed, this SLASH feature bears a superficial—and misleading—resemblance to the Categorial Grammar connectives written as '/' and '\'. On the other hand, a centrally important architectural feature of GPSG absent from HPSG (and from HG) was the device of metarules, higher-order rules used to generate the full set of context-free phrase structure rules (PSRs) from an initial inventory of basic PSRs. Among the metarules were ones used to introduce non-null SLASH values and propagate them upward through trees to a position where they were discharged by combination with a matching constituent called a filler (analogous to a *wh*-moved expression in TG).

A note is in order about the sometimes confusing use of the names *Head Grammar* (*HG*) and *HPSG*. Strictly speaking, HG was a specific subtype of generalized CFG developed in Pollard's dissertation work, but the term *HG* did not appear in academic linguistic publications with the exception of the Pollard & Sag (1983) WCCFL paper, which introduced the distinction between head features and binding features (the latter were incorporated into GPSG under the name *foot features*). In the summer of 1982, Pollard had started working part time on the HP NL project; and the term *HPSG* was first employed (by Pullum) in reference to an extensive reworking by Pollard and Paulson of the then-current HP GPSG implementation, incorporating some of the main features of Pollard's dissertation work in progress, carried out over the summer of 1983, while much of the HP NLP team (including Pullum and Sag) was away at the LSA Institute in Los Angeles. The implication of the name change was that whatever this new system was, it was no longer GPSG.

Once this first HPSG implementation was in place, the NLP work at HP was considered to be within the framework of HPSG, rather than GPSG. After Pollard completed his dissertation, he continued to refer to *HG* in invited talks as late as autumn 1984; but his talk at the (December 1984) LSA Binding Theory Symposium used *HPSG* instead, and after that, the term *HG* was supplanted by *HPSG* (except in publications by non-linguists about formal language theory). One additional complication is that until the Gazdar, Klein, Pullum & Sag (1985) volume appeared, GPSG and HPSG were developing side by side, with considerable interaction. Pollard, together with Flickinger, Wasow, Nerbonne, and others, did HPSG; Gazdar and Klein did GPSG; and Sag and Pullum worked both sides of the street.

HPSG papers, about both theory and implementation, began to appear in 1985, starting with Pollard's WCCFL paper *Phrase structure grammar without metarules*

Dan Flickinger, Carl Pollard & Thomas Wasow

(Pollard 1985), and his paper at the Categorial Grammar conference in Tucson (Pollard 1988), comparing and contrasting HPSG with then-current versions of Categorial Grammar due to Bach, Dowty, and Steedman. These were followed by a trio of ACL papers documenting the current state of the HPSG implementation at HP Labs: Creary & Pollard (1985), Flickinger, Pollard & Wasow (1985), and Proudian & Pollard (1985). Of those three, the most significant in terms of its influence on the subsequent development of the HPSG framework was the second, which showed how the lexicon could be (and in fact was) organized using multiple-inheritance knowledge representation; Flickinger's Stanford dissertation (Flickinger 1987) was an in-depth exploration of that idea.

# **5 Early HPSG**

Setting aside implementation details, early HPSG can be characterized by the following architectural features:

**Elimination of metarules** Although metarules were a central feature of GPSG, they were also problematic: Uszkoreit & Peters (1982) had shown that if metarules were allowed to apply to their own outputs, then the resulting grammars were no longer guaranteed to generate CFLs; indeed, such grammars could generate all recursively enumerable languages. And so, in GPSG, the closure of a set of base phrase structure rules (PSRs) under a set of metarules was defined in such a way that no metarule could apply to a PSR whose own derivation involved an application of that metarule. This definition was intended to ensure that the closure of a finite set of PSRs remained finite, and therefore still constituted a CFG.

So, for example, the metarule STM1 was used in GPSG to convert a PSR into another PSR, one of whose daughters is [+NULL] (informally speaking, a "trace"), and feature cooccurrence restrictions (FCRs) guaranteed that such daughters would bear a SLASH value, and that this SLASH value would also appear on the mother. Unfortunately, the finite closure definition described above does not preclude the possibility of derived PSRs whose mother carries multiple, in fact unboundedly many SLASH values (e.g. NP/NP, (NP/NP)/NP, etc.). And this in turn leads to an infinite set of PSRs, outside the realm of CF-ness (see Ristad 1986). Of course, one could rein in this excess power by imposing another FCR that disallows categories of the form (X/Y)/Z; but then there is no way to analyze sentences containing a constituent with two undischarged unbounded dependencies, such as the VP complement of *easy* in the following example:

#### 2 The evolution of HPSG

(1) Violins this finely crafted, even the most challenging sonatas are easy to [play \_ on \_]. (adapted from Pollard & Sag 1994: 169)

GPSG avoided this problem by not analyzing such examples. In HPSG (Pollard 1985), by contrast, such examples were analyzed straightforwardly by replacing GPSG's category-valued SLASH feature with one whose values were lists (or sets) of categories. This approach still gave rise to an infinite set of rules, but since maintaining context-freeness was no longer at stake, this was not seen as problematic. The infinitude of rules in HPSG arose not through a violation of finite closure (since there were no longer any metarules at all), but because each of the handful of schematic PSRs (see below) could be directly instantiated in an infinite number of ways, given that the presence of list-valued features gave rise to an infinite set of categories.

**Lexical rules** GPSG, generalizing a suggestion of Flickinger (1983), constrained metarules to apply only to PSRs that introduced a lexical head. Pollard (1985) took this idea a step further, noting that many proposed metarules could be reformulated as lexical rules that (among other effects) operated on the subcategorization frames (encoded by the SUBCAT feature discussed below) of lexical entries. The idea of capturing some linguistic generalizations by means of rules internal to the lexicon had been explored by generative grammarians since Jackendoff (1975); and lexical rules of essentially the kind Pollard proposed were employed by Bach (1983), Dowty (1978), and others working in Categorial Grammar. Examples of constructions handled by metarules in GPSG but in HPSG by lexical rules included sentential extraposition, subject extraction, and the passive. Flickinger, Pollard & Wasow (1985) argued for an architecture for the lexicon that combined lexical rules with multiple inheritance using a frame-based knowledge representation system (Minsky 1975), on the basis of both overall grammar simplicity and efficient, easily modifiable implementation.

**CG-like treatment of subcategorization** GPSG treated subcategorization using an integer-valued feature called SUBCAT that in effect indexed each lexical item with the rule that introduced and provided its subcategorization frame; e.g. *weep* was listed in the lexicon with SUBCAT value 1 while *devour* was listed with SUBCAT value 2, and then PSRs of roughly the form in (2) guaranteed that lexical heads would have the right kinds of complements.

(2) VP → V[SUBCAT 1] VP → V[SUBCAT 2] NP

#### Dan Flickinger, Carl Pollard & Thomas Wasow

In HPSG, by contrast, the SUBCAT feature directly characterized the grammatical arguments selected by a head (not just the complements, but the subject too) as a list of categories, so that e.g. *weep* was listed as V[SUBCAT NP ] but *devour* as V[SUBCAT NP, NP ] (where the first occurrence of NP refers to the object and the second to the subject). This treatment of argument selection was inspired by Categorial Grammar, where the same verbs would have been categorized as NP\S and (NP\S)/NP respectively;<sup>4</sup> the main differences are that (i) the CG treatment also encodes the directionality of the argument relative to the head, and (ii) in HPSG, all the arguments appear on one list, while in CG they are "picked up" one at a time, with as many connectives (/ or \) as there are arguments. In particular, as in the CG of Dowty (1982b), the subject was defined as the last argument, except that in HPSG, "last" now referred to the rightmost position on the SUBCAT list, not to the most deeply embedded connective. In HPSG, this ordering of the categories on the SUBCAT list was related not just to CG, but also to the traditional grammatical notion of obliqueness, and also to the accessibility hierarchy of Keenan & Comrie (1977). See Müller & Wechsler (2014: Section 4) for a more detailed discussion of these developments from GPSG to HPSG.

**Schematic rules** Unlike CFG but like CG, HPSG had only a handful of schematic rules. For example, in Pollard (1985), a substantial chunk of English "local" grammar (i.e. leaving aside unbounded dependencies) was handled by three rules: (i) a rule (used for subject-auxiliary inversion) that forms a sentence from an inverted (+INV) lexical head and all its arguments; (ii) a rule that forms a phrase from a head with SUBCAT list of length > 1 together with all its non-subject arguments; and (iii) a rule that forms a sentence from a head with a SUBCAT value of length one together with its single (subject) argument.

**List- (or set-) valued SLASH feature** The list-valued SLASH was introduced in Pollard (1985) to handle multiple unbounded dependencies, instead of the GPSG category-valued SLASH (which in turn originated as the *derived categories* of Gazdar (1981), e.g. S/NP). In spite of the notational similarity, though, the PSG SLASH is not an analog of the CG slashes / and \ (though HPSG's SUBCAT is, as explained above). In fact, HPSG's SLASH has no analog in the kinds of CGs being developed by Montague semanticists such as Bach (1979; 1980) and Dowty (1982a) in the late 1970s and early 1980s, which followed the CGs of Bar-Hillel (1954) in having only rules for eliminating (or canceling) slashes as in (3):

<sup>4</sup>We adhere to the Lambek convention for functor categories, so that expressions seeking to combine with an A on the left to form a B are written "A\B" (not "B\A").

#### 2 The evolution of HPSG

$$\begin{array}{cccc} \text{(3)} & \frac{\text{A A} \text{\AA}}{\text{B}} & \frac{\text{B} \text{\AA A}}{\text{B}} \end{array}$$

To find an analog to HPSG's SLASH in CG, we have to turn to the kinds of CGs invented by Lambek (1958), which unfortunately were not yet well-known to linguists (though that would soon change, starting with Lambek's appearance at the 1985 Categorial Grammar conference in Tucson). What sets apart grammars of this kind (and their elaborations by Moortgat (1989), Oehrle et al. (1988), Morrill (1994), and many others), is the existence of rules for hypothetical proof (not given here), which allow a hypothesized category occurrence introduced into a tree (thought of as a proof) to be discharged.

In the Gentzen style of natural deduction (see Pollard 2013), hypothesized categories are written to the left of the symbol ` (turnstile), so that the two slash elimination rules above take the following form, where Γ and Δ are lists of categories, and comma represents list concatenation as in (4):

$$\begin{array}{c} \text{(4)}\\ \hline \Gamma \vdash \mathsf{A} \quad \mathsf{A} \vdash \mathsf{A} \mathsf{\langle B\rangle} \\\hline \Gamma, \mathsf{A} \vdash \mathsf{B} \end{array} \quad \begin{array}{c} \Gamma \vdash \mathsf{B} / \mathsf{A} \quad \mathsf{A} \vdash \mathsf{A} \\\hline \Gamma, \mathsf{A} \vdash \mathsf{B} \end{array}$$

These rules serve to propagate hypotheses (analogous to linguists' traces) downward through the proof tree (downward because logicians' trees are upside down with the conclusion, or "root", at the bottom). In HPSG notation, these same rules can be written as one rule (since SUBCAT is nondirectional) in (5):

$$\begin{array}{c} \text{(5)} \quad \begin{array}{c} \text{B[susBCAT \{\dots, A\}, sLSSH \Gamma] \quad A[\text{sLSSH } \Lambda] \\ \hline \text{B[susBCAT \{\dots\}][susAB \Gamma, \Delta] \end{array} \end{array}$$

This in turn is a special case of an HPSG principle first known as the Binding Inheritance Principle (BIP) and later as the Nonlocal Feature Principle (binding features included SLASH as well as the features qUE and REL used for tracking undischarged interrogative and relative pronouns). The original statement of the BIP (Pollard 1986) treated SLASH as set- rather than list-valued:

The value of a binding feature on the mother is the union of the values of that feature on the daughters. (Pollard 1986)

For example, the doubly-gapped VP in the violins-and-sonatas example in (1) is analyzed in HPSG roughly as is shown in Figure 1 and essentially the same way in Lambek-style CG:

Dan Flickinger, Carl Pollard & Thomas Wasow

Figure 1: *play on* as part of *Violins this finely crafted, even the most challenging sonatas are easy to play on.*

$$\begin{array}{cccc} \text{(6)} & \text{play} & t & \text{on} & t\\ \hline \text{+ ((NP\\$)/PP)/NP} & \text{NP} \vdash \text{NP} & \text{+ NP} & \text{NP} \vdash \text{NP} \\ \hline \text{NP} \vdash \text{(NP\\$)/PP} & & \text{NP} \vdash \text{PP} & \\ \hline \text{NP,NP} \vdash \text{(NP\\$)} & & & \\ \end{array}$$

Aside from the binary branching of the Lambek analysis, the main difference is that HPSG traces of the form A[SLASH A ] correspond to Lambek axioms of the form A ` A, which is the standard mechanism for introducing hypotheses in Gentzen-style natural deduction.

An overview and elaboration of early HPSG is provided by the two books Pollard & Sag (1987) and Pollard & Sag (1994). Confusingly, the former is called *Information-Based Syntax and Semantics, Volume 1: Fundamentals*, and the second simply *Head-Driven Phrase Structure Grammar* (not *Information-Based Syntax and Semantics, Volume 2*). The reason for the title change had to do with a change in the underlying mathematical theory of feature structures. In the first book, following work in theoretical computer science by Rounds & Kasper (1986) and Moshier & Rounds (1987), feature structures were treated as data structures that supplied partial information about the linguistic objects being theorized about; this perspective in turn was based on Scott's (1982) mathematical theory of computation in terms of what he called information systems. Subsequently, Paul King persuaded Pollard and Sag that it was more straightforward to distinguish between feature structures, thought of as formal models of the linguistic objects, and feature descriptions or formulas of feature logic, which provided partial information about them, as described in his Manchester dissertation (King 1989). Although the formal issues involved in distinguishing between the two

#### 2 The evolution of HPSG

approaches are of interest in their own right, they seem not to have had a lasting effect on how theoretical linguists used HPSG, nor on how computational linguists implemented it. As for subject matter, Pollard & Sag (1987) was limited to the most basic notions, including syntactic features and categories (including the distinction between head features and binding features); subcategorization and the distinction between arguments and adjuncts (the latter of which necessitated one more rule schema beyond the three proposed by Pollard 1985); basic principles of grammar (especially the Head Feature Principle and the Subcategorization Principle); the obliqueness order and constituent ordering; and the organization of the lexicon by means of a multiple inheritance hierarchy and lexical rules. Pollard & Sag (1994) used HPSG to analyze a wide range of phenomena, primarily in English, that had figured prominently in the syntactic literature of the 1960s–1980s, including agreement, expletive pronoun constructions, raising, control, filler-gap constructions (including island constraints and parasitic gaps); so-called Binding Theory (the distribution of reflexive pronouns, non-reflexive pronouns, and non-pronominal NPs), and scope!of quantificational NPs. These topics are also handled in respective chapters of this handbook (Wechsler 2021; Abeillé 2021; Borsley & Crysmann 2021; Chaves 2021; Müller 2021a; Koenig & Richter 2021: Section 3).

# **6 Theoretical developments**

Three decades of vigorous work since Pollard & Sag 1987 developing the theoretical framework of HPSG receive detailed discussion throughout the present volume, but we highlight here two significant stages in that development. The first is in Chapter 9 of Pollard & Sag (1994), where a pair of major revisions to the framework presented in the first eight chapters are adopted, changing the analysis of valence and of unbounded dependencies. Following Borsley (1987; 1988; 1989; 1990), Pollard and Sag moved to distinguish subjects from complements, and further to distinguish subjects from specifiers, thus replacing the single SUBCAT attribute with SUBJ, SPR, and COMPS. This formal distinction between subjects and complements enabled an improved analysis of unbounded dependencies, eliminating traces altogether by introducing three lexical rules for the extraction of subjects, complements, and adjuncts respectively. It is this revised analysis of valence constraints that came to be viewed as part of the standard HPSG framework, though issues of valence representation cross-linguistically remain a matter of robust debate.

The second notable stage of development was the introduction of a type hierar-

#### Dan Flickinger, Carl Pollard & Thomas Wasow

chy of *constructions* as descriptions of phrasal feature structures, employed first by Sag (1997) in a richly detailed analysis of a wide variety of relative clause phenomena in English. This extension from the lexicon of the use of descriptions of typed feature structures organized in hierarchies to syntactic rules preserved the ability to express general principles holding for rule schemata while also enabling expression of idiosyncratic properties of phrases. In Abeillé & Borsley (2021), Chapter 1 of this volume, the version of the framework with this extended use of types is termed *Constructional HPSG*, including further elaboration by Ginzburg & Sag (2000) to a comprehensive analysis of interrogatives in English.

# **7 The LinGO project**

In the early 1990s, a consortium of research centers in Germany secured funding from the German government for a large project in spoken language machine translation, called Verb*mobil* (Wahlster 2000), which aimed to combine a variety of methods and frameworks in a single implemented state-of-the-art demonstrator system. Grammars of German and English were to be implemented in HPSG, to be used both for parsing and for generation in the translation of human-human dialogues, with a German grammar initially implemented by Pollard and Tibor Kiss at IBM in Heidelberg, later replaced by one developed by Stefan Müller and Walter Kasper at the German AI Research Center (DFKI), coordinator for the Verb*mobil* project. The DFKI contracted in 1993 with Sag at CSLI to design and implement the English grammar, with Flickinger brought over from HP Labs to help lead the effort, forming a new research group at CSLI initially called ERGO (for English Resource Grammar Online), later generalized to the name LinGO (Linguistic Grammars Online). Early LinGO members included Wasow and linguistics graduate student Rob Malouf, who authored the initial implementation of the English Resource Grammar (ERG), along with four other linguistics graduate students, Emily Bender, Kathryn Campbell-Kibler, Tony Davis, and Susanne Riehemann.

During the first of the two four-year phases of the Verb*mobil* project, the focus was on designing and implementing core syntactic and semantic analyses, initially using the DISCO/PAGE platform (Uszkoreit et al. 1994) developed at the DFKI, and largely informed by the framework presented in Pollard & Sag (1994). However, a more computationally useful semantic formalism emerged, called Minimal Recursion Semantics (MRS: Copestake, Flickinger, Pollard & Sag 2005), which Ann Copestake, formerly of the European ACQUILEX project, helped to design. Copestake also expanded the LKB system (Copestake 2002) which had been used in ACQUILEX, to serve as the grammar development environment

#### 2 The evolution of HPSG

for the LinGO project, including both a parser and a generator for typed feature structure grammars.

The second four years of the Verb*mobil* project emphasized development of the generation capabilities of the ERG, along with steady expansion of linguistic coverage, and elaboration of the MRS framework. LinGO contributors in this phase included Sag, Wasow, Flickinger, Malouf, Copestake, Riehemann, and Bender, along with a regular visitor and steady contributor from the DFKI, Stephan Oepen. Verb*mobil* had meanwhile added Japanese alongside German (Müller & Kasper 2000) and English (Flickinger, Copestake & Sag 2000) for more translation pairs, giving rise to another relatively broad-coverage HPSG grammar, Jacy, authored by Melanie Siegel at the DFKI (Siegel 2000). Work continued at the DFKI, of course, on the German HPSG grammar, written by Stefan Müller, adapted from his earlier Babel grammars (Müller 1999), and with semantics contributed by Walter Kasper.

Before the end of Verb*mobil* funding in 2000, the LinGO project had already begun to diversify into other application and research areas using the ERG, including over the next several years work on augmented/adaptive communication, multiword expressions, and hybrid processing with statistical methods, variously funded by the National Science Foundation, the Scottish government, and industrial partners including IBM and NTT. At the turn of the millennium, Flickinger joined the software start-up boom, co-founding YY Software funded through substantial venture capital to use the ERG for automated response to customer emails for e-commerce companies. YY produced the first commercially viable software system using an HPSG implementation, processing email content in English with the ERG and the PET parser (Callmeier 2000) which had been developed by Ulrich Callmeier at the DFKI, as well as in Japanese with Jacy, further developed by Siegel and by Bender. While technically capable, the product was not commercially successful enough to enable YY to survive the bursting of the dot-com bubble, and it closed down in 2003. Flickinger returned to the LinGO project with a considerably more robust ERG, and soon picked up the translation application thread again, this time using the ERG for generation in the LOGON Norwegian–English machine translation project (Lønning et al. 2004) based in Oslo.

# **8 Research and teaching networks**

The first international conference on HPSG was held in 1993 in Columbus, Ohio, in conjunction with the Linguistic Society of America's Summer Institute. The conference has been convened every year since then, with locations in Europe,

#### Dan Flickinger, Carl Pollard & Thomas Wasow

Asia, and North America. Two of these annual meetings have been held jointly with the annual Lexical Functional Grammar conference, in 2000 in Berkeley and in 2016 in Warsaw. Proceedings of these conferences since 2000 are available on-line from CSLI Publications.<sup>5</sup> Since 2003, HPSG researchers in Europe have frequently held a regional workshop in Bremen, Berlin, Frankfurt, or Paris, annually since 2012, to foster informal discussion of current work in HPSG. These follow in the footsteps of European HPSG workshops starting with one on German grammar, held in Saarbrücken in 1991, and including others in Edinburgh and Copenhagen in 1994, and in Tübingen in 1995.

In 1994, the HPSG mailing list was initiated,<sup>6</sup> and from 1996 to 1998, the electronic newsletter, the HPSG Gazette,<sup>7</sup> was distributed through the list, with its function then taken over by the HPSG mailing list.

Courses introducing HPSG to students became part of the curriculum during the late 1980s and early 1990s at universities in Osaka, Paris, Saarbrücken, Seoul, and Tübingen, along with Stanford and OSU. Additional courses came to be offered in Bochum, Bremen, Pittsburgh, Göttingen, Heidelberg, Jena, Leuven, Potsdam, Seattle, Berlin, Essex, Buffalo, and Austin. Summer courses and workshops on HPSG have also been offered since the early 1990s at the LSA Summer Institute in the U.S., including a course by Sag and Pollard on binding and control in 1991 in Santa Cruz, and at the European Summer School in Logic, Language and Information (ESSLLI), including a course by Pollard in Saarbrücken in 1991 on HPSG, a workshop in Colchester in 1992 on HPSG, a workshop in Prague in 1996 on Romance (along with two HPSG-related student papers at the first-ever ESSLLI student session), and courses in 1998 in Saarbrücken on Germanic syntax, grammar engineering, and unification-based formalisms, in 2001 on HPSG syntax, in 2003 on linearization grammars, and more since. Also in 2001, a Scandinavian summer school on constraint-based grammar was held in Trondheim.

Several HPSG textbooks have been published, including at least Borsley (1991; 1996), Sag & Wasow (1999), Sag, Wasow & Bender (2003), Müller (2007a; 2013a; 2020), Kim (2016), and Levine (2017).

# **9 Implementations and applications of HPSG**

The first implementation of a grammar in the HPSG framework emerged in the Hewlett-Packard Labs natural language project, for English, with a lexical type

<sup>5</sup>http://csli-publications.stanford.edu/HPSG/, 2021-01-15.

<sup>6</sup> Its archives can be found at https://hpsg.hu-berlin.de/HPSG/MailingList.

<sup>7</sup>http://www.sfs.uni-tuebingen.de/~gazette, 2021-01-15.

#### 2 The evolution of HPSG

hierarchy (Flickinger, Pollard & Wasow 1985), a set of grammar rules that provided coverage of core syntactic phenomena including unbounded dependencies and coordination, and a semantic component called Natural Language Logic (Laubsch & Nerbonne 1991). The corresponding parser for this grammar was implemented in Lisp (Proudian & Pollard 1985), as part of a system called HP-NL (Nerbonne & Proudian 1987) which provided a natural language interface for querying relational databases. The grammar and parser were shelved when HP Labs terminated their natural language project in 1991, leading Sag and Flickinger to begin the LinGO project and development of the English Resource Grammar at Stanford.

By this time, grammars in HPSG were being implemented in university research groups for several other languages, using a variety of parsers and engineering platforms for processing typed feature structure grammars. Early platforms included the DFKI's DISCO system (Uszkoreit et al. 1994) with a parser and graphical development tools, which evolved to the PAGE system; the ALE system (Franz 1990; Carpenter & Penn 1996), which evolved in Tübingen to TRALE (Meurers, Penn & Richter 2002; Penn 2004); and Ann Copestake's LKB (Copestake 2002) which grew out of the ACQUILEX project. Other early systems included ALEP within the Eurotra project (Simpkins & Groenendijk 1994), Con-Troll at Tübingen (Götz & Meurers 1997), CUF at IMS in Stuttgart (Dörre & Dorna 1993), CL-ONE at Edinburgh (Manandhar 1994), TFS also at IMS (Emele 1994), ProFIT at the University of Saarland (Erbach 1995), Babel at Humboldt University in Berlin (Müller 1996), and HDrug at Groningen (van Noord & Bouma 1997).

Relatively early broad-coverage grammar implementations in HPSG, in addition to the English Resource Grammar at Stanford (Flickinger 2000), included one for German at the DFKI (Müller & Kasper 2000) and one for Japanese (Jacy: Siegel 2000), all used in the Verb*mobil* machine translation project; a separate German grammar (Müller 1996; 1999); a Dutch grammar in Groningen (Bouma, van Noord & Malouf 2001); and a separate Japanese grammar in Tokyo (Miyao et al. 2005). Moderately large HPSG grammars were also developed during this period for Korean (Kim & Yang 2003) and for Polish (Mykowiecka, Marciniak, Przepiórkowski & Kupść 2003).

In 1999, research groups at the DFKI, Stanford, and Tokyo set up a consortium called DELPH-IN (Initiative for Deep Linguistic Processing in HPSG), to foster broader development of both grammars and platform components, described in Oepen, Flickinger, Tsujii & Uszkoreit (2002). Over the next two decades, substantial DELPH-IN grammars were developed for Norwegian (Hellan & Haugereid 2003), Portuguese (Branco & Costa 2010), and Spanish (Marimon 2010), along

#### Dan Flickinger, Carl Pollard & Thomas Wasow

with moderate-coverage grammars for Bulgarian (Osenova 2011), Greek (Kordoni & Neu 2005), Hausa (Crysmann 2012), Hebrew (Arad Greshler, Herzig Sheinfux, Melnik & Wintner 2015), Indonesian (Moeljadi et al. 2015), Mandarin Chinese (Fan et al. 2015), Thai, and Wambaya (Bender 2008), all described at http://delphin.net. Several of these grammars are based on the Grammar Matrix (Bender, Flickinger & Oepen 2002), a starter kit generalized from the ERG and Jacy for rapid prototyping of HPSG grammars, along with a much larger set of coursework grammars.<sup>8</sup> Out of this work has grown the linguistically rich Grammar Matrix customization system (Bender, Drellishak, Fokkens, Poulson & Saleem 2010), a set of libraries of phenomena enabling a grammar developer to complete a questionnaire about characteristics of a language to obtain a more effectively customized starting grammar.

Broad-coverage grammars developed in the TRALE system (Meurers et al. 2002; Penn 2004) include German (Müller 2007a), Danish (Müller & Ørsnes 2015), and Persian (Müller 2010). Other TRALE grammars include Mandarin Chinese (Müller & Lipenkova 2013), Georgian (Abzianidze 2011), Maltese (Müller 2009), English (Müller 2018), and Yiddish (Müller & Ørsnes 2011). Development of grammars in TRALE is supported by the Grammix system (Müller 2007b); Müller (2015) provides a summary of this family of grammar implementations.

These grammars and systems have been used in a wide variety of applications, primarily as vehicles for research in computational linguistics, but also for some commercial software products. Research applications already mentioned include database query (HP Labs) and machine translation (Verb*mobil* and LOGON), with additional applications developed for use in anthology search (Schäfer, Kiefer, Spurk, Steffen & Wang 2011), grammar tutoring in Norwegian (Hellan, Bruland, Aamot & Sandøy 2013), ontology acquisition (Herbelot & Copestake 2006), virtual robot control (Packard 2014), visual question answering (Kuhnle & Copestake 2017), and logic instruction (Flickinger 2017), among many others. Commercial applications include e-commerce customer email response (for YY Software), and grammar correction in education (for Redbird Advanced Learning, now part of McGraw-Hill Education: Suppes, Liang, Macken & Flickinger 2014). See Bender & Emerson (2021), Chapter 25 of this volume for further discussion.

For most practical applications, some approximate solution to the challenge of parse selection (disambiguation) must be provided, so developers of several of the DELPH-IN grammars, including the ERG, follow the approach of Oepen, Flickinger, Toutanova & Manning (2004), which uses a manually-annotated treebank of sentences parsed by a grammar to train a statistical model which is applied at

<sup>8</sup>http://moin.delph-in.net/MatrixTop, 2021-01-15.

#### 2 The evolution of HPSG

run-time to identify the most likely analysis for each parsed sentence. These treebanks can also serve as repositories of the analyses intended by the grammarian for the sentences of a corpus, and some resources, notably the Alpino Treebank (Bouma, van Noord & Malouf 2001), include analyses which the grammar may not yet be able to produce automatically.

# **10 Prospects**

As we noted early in this chapter, HPSG's origins are rooted in the desire simultaneously to address the theoretical concerns of linguists and the practical issues involved in building a useful natural language processing system. In the decades since the birth of HPSG, the mainstream of work in both theoretical linguistics and NLP developed in ways that could not have been anticipated at the time. NLP is now dominated by statistical methods, with almost all practical applications making use of machine learning technologies. It is hard to see any influence of research by linguists in most NLP systems, though periodic workshops have helped to keep the conversation going.<sup>9</sup> Mainstream grammatical theory, on the other hand, is now dominated by the Minimalist Program (MP), which is too vaguely formulated for a rigorous comparison with HPSG.<sup>10</sup> Concern with computational implementation plays virtually no role in MP research; see Müller (2016) for a discussion.

It might seem, therefore, that HPSG is further from the mainstream of both fields than it was at its inception, raising questions about how realistic the objectives of HPSG are. We believe, however, that there are grounds for optimism.

With regard to implementations, there is no incompatibility between the use of HPSG and the machine learning methods of mainstream NLP. Indeed, as noted above, HPSG-based systems that have been put to practical use have necessarily included components induced via statistical methods from annotated corpora. Without such components, the systems cannot deal with the full variety of forms

<sup>9</sup>For example, one on "Building Linguistically Generalizable NLP Systems" at the 2017 EMNLP conference in Copenhagen, and one on "Relevance of Linguistic Structure in Neural NLP" at the 2018 ACL conference in Melbourne.

<sup>10</sup>Most work in MP is presented without precise definitions of the technical apparatus, but Edward Stabler and his collaborators have written a number of papers aimed at formalizing MP. See in particular Collins & Stabler (2016). Torr (2019) describes a large-scale implemented fragment in the framework of Minimalist Grammar. See Müller (2020: 177–180) for a comparison of this fragment with HPSG. As Müller points out, many of the implementation techniques employed can be found in HPSG grammars, e.g., discontinuous constituents and the SLASH-based approach to nonlocal dependencies.

#### Dan Flickinger, Carl Pollard & Thomas Wasow

encountered in usage data. On the other hand, existing NLP systems that rely solely on machine learning from corpora do not exhibit anything that can reasonably be called understanding of natural language. Current technologies for machine translation, automatic summarization, and various other linguistic tasks fall far short of what humans do on these tasks, and are useful primarily as tools to speed up the tasks for the humans carrying them out. Many NLP researchers are beginning to recognize that developing software that can plausibly be said to understand language will require representations of linguistic structure and meaning like those that are the stock in trade of linguists. See Bender, Flickinger, Oepen, Packard & Copestake (2015) for more discussion on sentence meaning.

Evidence for a renewed interest in linguistics among NLP researchers is the fact that major technology companies with natural language groups have recently begun (or in some cases, resumed) hiring linguists, and increasing numbers of new linguistics PhDs have taken jobs in the software industry.

In the domain of theoretical linguistics, it is arguable that the distance between HPSG and the mainstream of grammatical research (that is, MP) has narrowed, given that both crucially incorporate ideas from Categorial Grammar (see Retoré & Stabler 2004, Berwick & Epstein 1995, and Müller 2013b for comparisons between MP and CG, for a general comparison of MP and HPSG see also Borsley & Müller 2021, Chapter 28 of this volume). Rather than trying to make that argument, however, we will point to connections that HPSG has made with other work in theoretical linguistics. Perhaps the most obvious of these is the work of Peter Culicover and Ray Jackendoff on what they call *Simpler Syntax*. Their influential 2005 book with that title (Culicover & Jackendoff 2005) argues for a theory of grammar that differs little in its architecture and motivations from HPSG.

More interesting are the connections that have been forged between research in HPSG and work in Construction Grammar (CxG). Fillmore (1988: 36) characterizes the notion of *construction* as "any syntactic pattern which is assigned one or more conventional functions in a language, together with whatever is linguistically conventionalized about its contribution to the meaning or use of structures containing it." Among the examples that construction grammarians have described at length are *the Xer, the Yer* (as in *the older I get, the longer I sleep*), *X let alone Y* (as in *I barely got up in time to eat lunch, let alone cook breakfast*), and *What's X doing Y?* (as in *What's this scratch doing in the table?*). As noted above and in Müller (2021c: 1497, 1506), Chapter 32 of this volume, HPSG has incorporated the notion of construction since at least the late 1990s.

Nevertheless, work that labels itself CxG tends to look very different from HPSG. This is in part because of the difference in their origins: many proponents of CxG come from the tradition of Cognitive Grammar or typological studies,

#### 2 The evolution of HPSG

whereas HPSG's roots are in computational concerns. Hence, most of the CxG literature is not precise enough to allow a straightforward comparison with HPSG, though the variants called Embodied Construction Grammar and Fluid Construction Grammar have more in common with HPSG; see Müller 2017; 2020: Sections 10.6.3–10.6.4 for a comparison. In the last years of his life, Ivan Sag sought to unify CxG and HPSG through collaboration with construction grammarians from the University of California, Berkeley, particularly Charles Fillmore, Paul Kay, and Laura Michaelis. They developed a theory called *Sign-Based Construction Grammar* (SBCG), which would combine the insights of CxG with the explicitness of HPSG. Sag (2012: 70) wrote, "To readers steeped in HPSG theory, SBCG will no doubt seem like a minor variant of constructional HPSG." Indeed, despite the name change, the main feature of SBCG that differs from HPSG is that it posits an inheritance hierarchy of constructs, which includes feature structure descriptions for such partially lexicalized multi-word expressions as *Ved X's way PP*, instantiated in such VPs as *ad-libbed his way through a largely secret meeting*. While this is a non-trivial extension to HPSG, there is no fundamental change to the technical machinery. In fact, it has been a part of the LinGO implementation for many years.

That said, there is one important theoretical issue that divides HPSG and SBCG from much other work in CxG. That issue is locality. To constrain the formal power of the theory, and to facilitate computational tractability, SBCG adopts what Sag (2012: 150) calls "Constructional Localism" and describes it as follows: "Constructions license mother-daughter configurations without reference to embedding or embedded contexts." That is, like phrase structure rules, constructions must be characterized in terms of a mother node and its immediate daughters. At first glance, this seems to rule out analyses of many of the examples of constructions provided in the CxG literature. But Sag (2012: 150) goes on to say, "Constructional Localism does not preclude an account of nonlocal dependencies in grammar, it simply requires that all such dependencies be locally encoded in signs in such a way that information about a distal element can be accessed locally at a higher level of structure."

Fillmore (1988: 35) wrote:

Construction grammars differ from phrase-structure grammars which use *complex symbols* and allow the *transmission of information* between lower and higher structural units, in that we allow the direct representation of the required properties of subordinate constituents. (Should it turn out that there are completely general principles for predicting the kinds of information that get transmitted upwards or downwards, this may not be a real difference.) (Fillmore 1988: 35)

#### Dan Flickinger, Carl Pollard & Thomas Wasow

SBCG is committed to the position alluded to in the parenthetical sentence in this quote, namely, that general principles of information transmission within sentences make it possible to insist on Constructional Localism. See Müller (2021c), Chapter 32 of this volume for a much more detailed discussion, and Van Eynde (2015) for a review of the 2012 SBCG book.

Finally, another point of convergence between work in HPSG and other work in both theoretical linguistics and NLP is the increasing importance of corpus and experimental data. In the early years of the HP NL project, the methodology was the same as that employed in almost all work in theoretical syntax and semantics: the grammar was based entirely on examples invented by the researchers. At one point during the decade of the HP NL project, Flickinger, Pullum, and Wasow compiled a list of sentences intended to exemplify many of the sentence types that they hoped the system would eventually be able to analyze. That list, 1328 sentences long, continues to be useful as a test suite for the ERG, and is also used by various other NLP groups. But it does not come close to covering the variety of sentence forms that are found in corpora of speech and various written genres. As the goals of the HPSG implementations have broadened from database query to dealing with "language in the wild", the use of corpora to test such systems and motivate extensions to them has increased. This parallels a development in other areas of linguistics, which have also increasingly made use of large on-line corpora as sources of data and tests of their theories. This is a trend that we expect will continue.

Experimental data has been particularly important in the exploration of whether well-known constraints on phenomena like extraction or ellipsis are really due to the grammar of natural languages or the convergence of frequency, discourse factors, and aspects of human sentence processing. Hofmeister & Sag (2010), Chaves & Dery (2019), and Chaves & Putnam (2020), for example, have argued that many so-called island constraints are not grammatical in nature. Similarly, Shiraïshi et al. (2019) claim that some parallelism effects in Right Node Raising are not grammatical in nature. Both lines of research lead to a reduction of what grammars are responsible for and question the traditional division of labor between the grammatical system, properties of the discourse within which utterances are embedded, and processing considerations. We expect work along these lines to continue in the future (see also Wasow (2021), Chapter 24 of this volume for the relation between HPSG and work in sentence processing).

In short, there are signs of convergence between work on HPSG and work in other areas, and it seems plausible to think that the market for HPSG research will grow in the future.

2 The evolution of HPSG

# **Acknowledgments**

The work on this chapter by Flickinger was generously supported by a fellowship at the Oslo Center for Advanced Study at the Norwegian Academy of Science and Letters. The authors also thank several reviewers for their insightful and detailed comments on drafts of the chapter, including Emily M. Bender, Robert Borsley, Danièle Godard, Jean-Pierre Koenig and Stefan Müller.

# **References**


Dan Flickinger, Carl Pollard & Thomas Wasow


#### 2 The evolution of HPSG


Dan Flickinger, Carl Pollard & Thomas Wasow

efficient processing with HPSG: Methods, systems, evaluation, 99–107. DOI: 10.1017/S1351324900002369.


Chomsky, Noam. 1965. *Aspects of the theory of syntax*. Cambridge, MA: MIT Press.


#### 2 The evolution of HPSG

M. Purvis (eds.), *Selected proceedings of the 42nd Annual Conference on African Linguistics (ACAL 42)*, 321–337. Somerville, MA: Cascadilla Press. http://www. lingref.com/cpp/acal/42/paper2780.pdf (2 February, 2021).


#### Dan Flickinger, Carl Pollard & Thomas Wasow

*the Grammar Engineering Across Frameworks (GEAF) Workshop*, 17–24. Beijing, China: Association for Computational Linguistics. DOI: 10.18653/v1/W15-33.


#### 2 The evolution of HPSG


Dan Flickinger, Carl Pollard & Thomas Wasow


#### 2 The evolution of HPSG


#### Dan Flickinger, Carl Pollard & Thomas Wasow

*neering Across Frameworks (GEAF) Workshop*, 9–16. Beijing, China: Association for Computational Linguistics. DOI: 10.18653/v1/W15-33.


#### 2 The evolution of HPSG


Dan Flickinger, Carl Pollard & Thomas Wasow


#### 2 The evolution of HPSG

32). Dordrecht: D. Reidel Publishing Company. DOI: 10.1007/978-94-015-6878- 4.


#### Dan Flickinger, Carl Pollard & Thomas Wasow

*tures* (Studies in Linguistics and Philosophy 32), 391–415. Dordrecht: D. Reidel Publishing Company. DOI: 10.1007/978-94-015-6878-4\_14.


#### 2 The evolution of HPSG


Dan Flickinger, Carl Pollard & Thomas Wasow


#### 2 The evolution of HPSG

*handbook* (Empirically Oriented Theoretical Morphology and Syntax), 1081– 1104. Berlin: Language Science Press. DOI: 10.5281/zenodo.5599866.


# **Chapter 3**

# **Formal background**

# Frank Richter

Goethe Universität Frankfurt

This chapter provides a very condensed introduction to a formalism for Pollard & Sag (1994) and explains its fundamental concepts. It pays special attention to the model-theoretic meaning of HPSG grammars. In addition, it points out some links to other, related formalisms, such as feature logics of partial information, and to related terminology in the context of grammar implementation platforms.

# **1 Introduction**

The two HPSG books by Pollard & Sag (1987; 1994) do not present grammar formalisms with the intention to provide precise definitions. Instead they refer to various inspirations in the logics of typed feature structures or in predicate logic, informally characterize the intended formalisms, and explain them as they are used in concrete grammars of English. Pollard & Sag (1994) further clarify their intentions in an appendix which lists most (but not all) of the components of their grammar of English explicitly, and summarizes most of their core assumptions. With this strategy, both books leave room for interpretation.

There are a number of challenges with reviewing the formal background of HPSG. Some of them have to do with the long publication history of relevant papers and books, some with the considerable influence of grammar implementation platforms, which have their own formalisms and shape the way in which linguists think and talk about grammars with their platform-specific terminology and notational conventions. Salient examples include convenient notations for phrase structure rules, the treatment of lexical representations or the lexicon, mechanisms for lexical rules, and notations for default values, among many

Frank Richter. 2021. Formal background. In Stefan Müller, Anne Abeillé, Robert D. Borsley & Jean- Pierre Koenig (eds.), *Head-Driven Phrase Structure Grammar: The handbook*, 89–124. Berlin: Language Science Press. DOI: 10. 5281/zenodo.5599822

### Frank Richter

other devices. Many of these notations are well-known in the HPSG community; they are convenient, compact and arguably even necessary to write readable grammars. At the same time, they are a meta-notation in the sense that they do not (directly) belong to the syntax of the assumed feature logics. However, even if they are outside a declarative, logical formalism for HPSG, there is usually a way to interpret them in HPSG-compatible formalisms, but the necessary re-interpretation can deviate to a larger or lesser degree from what their users have in mind when they write their grammars. For example, a phrase structure rule in the sense of a context-free or context-sensitive rewrite system is not the same as an ID Schema written in a feature logic, which might matter in some cases but not in others. To name one difference, an ID Schema may easily leave the number of a phrase's daughters unspecified (and thus potentially infinite). The differences may be sometimes subtle and sometimes significant, but they entail that the meaning of the notations seen through the lens of logic is not what their users might assume either based on their meaning in other contexts or on what is gleaned from the behavior of a given implementation platform for parsing or generation which employs that kind of syntax. Similarly, terminology that belongs to the computational environment of implementations is often transferred to grammar theory, and again, when checking the technical specifics, a re-interpretation in terms of a feature-logical HPSG formalism can sometimes be trivial and sometimes nearly impossible, and different available re-interpretation choices lead to significantly different results.

Reviewing HPSG's formal background, it is not only the multi-purpose character and flexibility of the ubiquitous informal attribute-value matrix (AVM) notation and its practical notational enhancements (for lexical rules, decorated sort hierarchies, phrase structure trees, etc.) that one needs to be aware of, but also early changes in foundational assumptions and terminology. When first presented in a book in 1987, HPSG was conceived of as a *unification-based* grammar theory, a name, the authors explain, which "arises from the algebra that governs partial information structures" (Pollard & Sag 1987: 7). This algebra was populated by partial feature structures with unification as a fundamental algebraic operation. In the framework envisioned seven years later in Pollard & Sag (1994), that algebra did not exist anymore, feature structures were no longer partial but total objects in models of a logical theory, and unification was no longer defined in the new setting (as the relevant algebra was gone). However, most of the notation and considerable portions of the terminology of 1987 remain with us to this day, such as the *types* of feature structures (replaced by *sorts* in 1994, when the term *type* was used for a different concept, to be discussed below), the pieces of in-

#### 3 Formal background

formation (for 1987-style feature structures) or even the word *unification*, which took on a casual life of its own without the original algebra in which it had been defined. Occasionally these words still have a precise technical interpretation in the language of grammar implementation environments or in their run-time system, which may reinforce their use in the community despite their lack of meaning in the standard formalism of HPSG. Implementation platforms also often add their own technical and notational devices, thereby inviting linguists to import them as useful tools into their theoretical grammar writing.

This handbook article cannot disentangle the history of and relationships between the various formalisms leading to an explication of the 1994 version of HPSG, nor of those that existed and still exist in parallel. It sets out to clarify the terminology and structure of a formalism for Pollard & Sag (1994) and presents a canonical formalism of the final version of HPSG in Pollard & Sag (1994). Only occasionally will it point out some of the differences to its 1987 precursor where the older terminology is still present in current HPSG papers and may be confusing to an audience unaware of the different usages of terms. Similarly, it does not cover the HPSG variant Sign-Based Construction Grammar (SBCG; Sag 2012; Müller 2021: Section 1.3.2, Chapter 32 of this volume).

The main sources of the present summary are the model theories for HPSG by King (1999) and Pollard (1999), and their synoptic reconstruction on the basis of a comprehensive logical language for HPSG, *Relational Speciate Re-entrant Language* (RSRL) by Richter (2004), including the critique and extensions sketched in Richter (2007). Section 2 gives a largely non-technical introductory overview which should provide sufficient background to follow all linguistic chapters of the present handbook. The subsequent sections (3–6) introduce RSRL and are for readers keen on obtaining a deeper understanding or looking for clarification of what might remain vague and imprecise in an initial broad overview. Those sections might be more challenging for the casual reader, but in return offer a fairly self-contained and comprehensive summary, omitting only the mathematical groundwork and definitions needed to spell out alternative model-theories, as this goes beyond what can reasonably be compressed to handbook format.

# **2 Essentials: An informal overview**

This section presents an informal summary of the essentials of an HPSG formalism in the sense of Pollard & Sag (1994) as it emerged from their original outline and its subsequent elaboration. From here on, the term "HPSG formalism" always refers to this tradition, unless explicitly stated otherwise. All later sections

### Frank Richter

in this chapter will flesh out the basic ideas introduced here with a precise technical treatment of the relevant notions. Readers who are already familiar with feature logics and are specifically interested in technical details may want to skip ahead to Section 3.

At the heart of HPSG is a fundamental distinction between descriptions and described objects: a grammar avails itself of descriptions with the purpose of describing linguistic objects. Pollard & Sag (1994: 17–18, 396) commit to the ontological assumption that linguistic objects only exist as complete objects. Partial linguistic objects do not exist. Descriptions of linguistic objects, however, are typically *partial*, i.e. they do not mention many, or even most, properties of the objects in their denotation. They are *underspecified*. A word can be described as being nominal and plural, leaving all its other properties (gender, case, number and category of its arguments, etc.) unspecified. But any concrete word being so described will have all other properties that a plural noun can have, with none of them missing. A single underspecified description can therefore describe many distinct linguistic objects. Grammatical descriptions often describe an infinity of objects. Again considering plural nouns, English can be thought of as having a very large number or an infinity of them due to morphological processes such as compounding, depending on the choice of morphological analysis.

Descriptions are couched in a (language of a) feature logic rather than in English for precision. Linguistic objects as the subject of linguistic study are sharply distinguished from their logical descriptions and are entities in the denotation of the grammatical descriptions. The feature logic of HPSG can be seen as a particularly expressive variant of description logics. With this architecture, HPSG is a *model-theoretic* grammar framework as opposed to *generative-enumerative* grammar frameworks, which have rewrite systems that generate expressions from some start symbol(s) (Pullum & Scholz 2001).

A small digression might be in order to prevent confusion arising from the coexistence of different versions of feature logics. Varieties of HPSG more closely related to the tradition of Pollard & Sag (1987) do not make the same distinction between descriptions and described objects. Instead they employ a notion of *feature structures* as entities carrying *partial information*. These partial feature structures are, or correspond to, logical expressions in a certain normal form and are ordered in an algebra of partial information according to the amount of information they carry. In informal notation, they are written as AVMs just like the descriptions of the formalism we are presently concerned with, and this notational similarity contributes to obscuring substantial differences. When two partial feature structures carry compatible information, they are said to be unifiable.

#### 3 Formal background

Their unification returns a unique third feature structure in the given algebra that carries the more specific information that is obtained when combining the previous two pieces of information (supposing they were not the same to begin with). These ideas and the properties of algebras employed by feature logics of partial information are still essential for all current HPSG implementation platforms (see Bender & Emerson 2021, Chapter 25 of this volume), which is presumably one of the reasons why the terminology of unification and unification-based grammars is still popular in the HPSG community. Returning to Pollard & Sag (1994), in a certain informal and casual sense, combining two non-contradictory descriptions into one single bigger description by logical conjunction could be called – and often is called – their unification. However, since the logical descriptions of HPSG in the tradition of Pollard & Sag (1994) can no longer be arranged in an appropriate algebra, there is no technical interpretation of the term in this context.<sup>1</sup>

HPSG employs partial descriptions in all areas of grammar, comprising at least phonology (Höhle 1999, but also Bird & Klein 1994 and Walther 1999), morphology (Crysmann 2021, Chapter 21 of this volume), syntax, semantics (Koenig & Richter 2021, Chapter 22 of this volume) and pragmatics (De Kuthy 2021, Chapter 23 of this volume; Lücking, Ginzburg & Cooper 2021, Chapter 26 of this volume). The descriptions are normally notated as AVMs and contain sort symbols (by convention in italics with lower case letters) and attribute symbols (in small caps). These are augmented by the standard logical connectives (conjunction, disjunction, negation and implication) and relation symbols. So-called tags, boxed numbers, function as variables. (1) shows a typical example in which *word*, *noun* and *plural* are sorts and SYNSEM, LOCAL, CATEGORY, etc. are attributes.<sup>2</sup> The AVM is a description of plural nouns.

### (1) *word* SYNSEM|LOCAL CATEGORY|HEAD *noun* CONTENT|INDEX|NUMBER *plural*

A description such as (1) presupposes a declaration of the admissible nonlogical symbols: as in any formal logical theory, the vocabulary of the formal language in which the logical theory is written must be explicitly introduced as the *alphabet* of the language, together with a set of logical symbols. This means

<sup>1</sup>This state of affairs is also responsible for the fact that implementation platforms often provide only a restricted syntax of descriptions and may also supply additional syntactic constructs which extend their logic of partial information toward the expressiveness of a feature logic with classical interpretation of negation and relational expressions.

<sup>2</sup>Tags, relations and logical connectives in descriptions will be illustrated later, in (3).

### Frank Richter

that the sorts, attributes and relation symbols must be listed. HPSG goes beyond merely stating the nonlogical vocabulary as sets of symbols by imposing additional structure on the set of sorts and on the relationship between sorts and attributes. This additional structure is known as the *sort hierarchy* and the *feature* (*appropriateness*) *declarations*.

The sort hierarchy and the feature declarations essentially provide the space of possible structures of the linguistic universe that an HPSG grammar talks about with its grammar principles. Metaphorically speaking, they generate a space of possible structures which is then constrained to the actual, well-formed structures which a linguist deems the grammatical structures of a language. The interaction between sort hierarchy and feature declarations is regulated by assumptions about feature inheritance and feature value inheritance. This can best be explained with a small example, using the tiny (and slightly modified) fragment from the sort hierarchy and feature appropriateness of Pollard & Sag (1994) shown in Figure 1.

Figure 1: Example of sort hierarchy with feature declarations

According to Figure 1, a top sort *object* is the highest sort with immediate subsorts *substantive*, *case*, *vform* and *boolean*. The two sorts *substantive* and *boolean* have their own immediate subsorts: *verb* and *noun*, and *plus* and *minus*, respectively. All are subsorts of *object*. The six sorts *verb*, *noun*, *case*, *vform*, *plus* and *minus* are *maximally specific* in this hierarchy, because they do not have proper subsorts. Such sorts are called *species*. The four sorts *case*, *vform*, *plus* and *minus* are also called *atomic*, because they are species and they do not have attributes appropriate to them.

Figure 1 contains nontrivial feature declarations for the sorts *substantive*, *verb* and *noun*, and it also illustrates the idea behind feature inheritance. First of all, *verb* and *noun* have attributes which are only appropriate to them but to no other

#### 3 Formal background

sort: VFORM is only appropriate to *verb*, and CASE is only appropriate to *noun*. But there is one more attribute appropriate to both due to feature inheritance: the attribute PRD is declared appropriate to *substantive*, and appropriateness declarations are inherited by subsorts, so PRD is also appropriate to *verb* and *noun*. The sort *noun* inherits the declaration unchanged from *substantive*.

Finally, we have to consider attribute values and their inheritance mechanism. Whereas attributes are called appropriate *to* a sort, I call a sort appropriate *for* an attribute at a given sort when talking about attribute values. For example, the non-maximal sort *boolean* is declared appropriate for the attribute PRD at *substantive*. This value declaration is also inherited by the subsorts, with a slight twist to it: at any subsort, the value for an attribute can become *more specific* (but not less specific) than at its supersort(s), and this is what happens here at the subsort *verb* of *substantive*. At *verb* the value of PRD must be one particular subsort of *boolean*, namely *plus*. 3

A further crucial aspect of the sort hierarchy and the feature declarations is their significance for the meaning of grammars. Structures in the denotation of a grammar must fulfill all their combined restrictions plus the constraints imposed by all grammar principles. Every denoted object must be of a maximally specific sort, i.e. Figure 1 allows only objects of the six species in the hierarchy. In addition, all attributes declared appropriate for a species (possibly by inheritance) must be present on objects of that species, with the values of course also obeying the feature declarations and being maximally specific. For example, an object of sort *noun* has CASE and PRD properties. The object that is the CASE value must be of sort *case* (because *case* is a species in the present example, unlike in real grammars where *case* has subsorts), and the sort of the PRD value must be either *plus* or *minus*, one of the two species which are maximally specific subsorts of *boolean*. With these restrictions, specifications like in Figure 1 determine the ontology of possible structures in the denotation of a grammar. The possible structures are further narrowed down by the grammar principles, leaving the well-formed structures as the predictions of a grammar.

This is a good opportunity to reconsider underspecified descriptions. With the sort hierarchy and feature declarations of Figure 1, there are a number of ways to underspecify the description of structures of sort *noun*. All following AVMs describe the same structures but differ in their degree of explicitness:

<sup>3</sup>The *plus* value for PRD at verbs is introduced here to create a useful example; it is not usually found in grammars.

Frank Richter

$$\begin{aligned} \text{(2)} \quad & \text{a. } \begin{bmatrix} \text{nom} \\ \text{CAS} \text{ case} \\ \text{PRD} & \text{plus} \vee \text{minus} \end{bmatrix} \\ & \text{b. } \begin{bmatrix} \text{nom} \\ \text{noun} \end{bmatrix} \\ & \text{c. } \begin{bmatrix} \text{nom} \\ \text{c. } \text{ASE object} \end{bmatrix} \\ & \text{d. } \begin{bmatrix} \text{object} \\ \text{c. } \text{case\ object} \end{bmatrix} \\ & \text{e. } \begin{bmatrix} \text{nom} \\ \text{c. } \text{case\ case} \\ \text{PRD} & \text{plus} \end{bmatrix} \lor \begin{bmatrix} \text{nom} \\ \text{c. } \text{case\ case} \\ \text{PRD} & \text{minus} \end{bmatrix} \end{aligned}$$

All AVMs in (2) denote the same two configurations as the fully specific AVM description in (2a): two *noun* structures with the CASE property *case* and the PRD property *plus* or the PRD value *minus*. But a description of these structures can be underspecified in many different ways. For *noun* structures in general (2b), the two just described are the only two structural choices, as can be verified by inspecting Figure 1. The description could mention in addition to what (2b) says that the structures have a CASE property, leaving its value underspecified (2c), but that does not make a difference with respect to the shape of the structures satisfying the description. Moreover, the only *object*s with CASE (2d) are nouns, but since that leaves exactly the two possible PRD values *plus* and *minus*, (2d) is yet another way to underspecify the two structures which (2a) describes exhaustively. Omitting the sort symbol *object* in the upper left-hand corner of (2d) would in fact be one more way to describe all nouns to the exclusion of everything else, because saying that something is an *object* does not restrict the range of choices. Finally, the disjunction embedded in the AVM in (2a) can be lifted to the top level of the description, yielding (2e).

Grammar principles are descriptions which every structure is supposed to obey, together with all its substructures. The Head Feature Principle, shown in (3a), is a frequent example. Every phrase whose syntax is a headed phrase (*headed-phrase*) is such that its HEAD value equals the HEAD value of its head daughter, indicated by the repeated occurrence of tag 1 as the value of the two HEAD features. Every structure which is described by the AVM to the left of the implication symbol (in this case simply a sort, but see (3d)) must also fulfill the requirements in the AVM to its right. If something is not a *headed-phrase*, it is not restricted by the Head Feature Principle because it is not described by the antecedent of the principle. At the same time, a structure which is not described by

#### 3 Formal background

*headed-phrase* still satisfies the Head Feature Principle as an implicational statement. For example, the SYNSEM value of each phrase is usually assumed to be an object of sort *synsem*, i.e. it is not a phrase of sort *headed-phrase*. As a *synsem* object, it is not described by the antecedent of (3a), thereby still fulfilling the principle. In classical logic, → is equivalent to ¬ ∨ , so something that is not an satisfies ¬ ∨ . This is highly relevant for the ultimate idea that a structure is only licensed by an HPSG grammar when it is well-formed in all its components with respect to all the grammar principles: every component of each structure that is described by the antecedent of a grammar principle also obeys what the consequent of the principle requires, or a given component of the structure is licensed by not being described by the antecedent of the given principle.

The tag 1 signals the identity of the value found at the end of the two distinct attribute paths leading to its occurrences. This state of affairs is often referred to as *token identity*. In the Head Feature Principle, the tag notation could be an informal notation for a *path equation*, or it could mean that 1 plays the role of a variable. The description language of Sections 3–4 offers both options for rendering such occurrences of tags in the syntax of RSRL.

$$\begin{aligned} \text{(3)} \quad \text{a. } \textit{head-phrase} &\Rightarrow \begin{bmatrix} \text{SYNSEM}|\text{LOCAL}|\text{CATEGRY}|\text{HEAD} & \boxed{\text{L}} \\ \text{READ-DTR}|\text{SYNSEM}|\text{LOCAL}|\text{CATEGRY}|\text{HEAD} & \boxed{\text{L}} \end{bmatrix} \\ \text{b. } \textit{word} &\Rightarrow (\text{LE}\_1 \lor \text{LE}\_2 \lor \dots \lor \text{LE}\_n) \\ \text{c. } \textit{sign} &\Rightarrow \begin{bmatrix} \text{SYNSEM}|\text{LOCBL} & \boxed{\text{QSTORE}} \\ \text{POOL} & \boxed{\text{E}} \end{bmatrix} \\ \text{d. } \textit{st-f-le} &\textbf{end-t-d \text{ elements } (\boxed{\text{L}}|\text{d}\rangle \land \boxed{\text{L}} \subseteq \boxed{\text{Z}} \land \boxed{\text{D}} = \boxed{\text{L}} - \boxed{\text{L}} \\ \text{d. } \mid \text{RETREED} & \textit{endist} \; \boxed{\text{D}} \text{s} \; \boxed{\text{L}} \text{s} \; \boxed{\text{D}} \text{s} \; \boxed{\text{D}} \text{s} \end{aligned} $$

The licensing of words by the grammar can also be understood as a consequence of a grammar principle with the shape of an implication. (3b) is known as the Word Principle (Höhle 2019: 500). LE<sup>1</sup> to LE in (3b) are the *lexical entries* of the grammar, descriptions of words. If an object is a word, it must be described by (at least) one of the disjuncts in the consequent of the Word Principle.

The semantic principle in (3c), taken from Pollard & Yoo (1998: 420),<sup>4</sup> illustrates one more syntactic construct of HPSG's description language, relations. The consequent of the principle consists of an AVM description conjoined with

<sup>4</sup>This principle is also discussed in the semantics chapter, Koenig & Richter (2021: 1008), Chapter 22 of this volume.

### Frank Richter

three relational expressions. Relations in HPSG often occur in connection with lists and sets, and so do the relations here: the binary relation set-of-elements relates the RETRIEVED value (a list) to a set 4 containing the elements on list 3 such that the set value 2 of POOL is a superset of 4 (using the subset relation), and the set value 1 of QSTORE contains those elements of 2 which are not on the RETRIEVED list (using set difference). In other words, each element of POOL is either in QSTORE or a list element on RETRIEVED, and nothing else is in QSTORE or on the RETRIEVED list.<sup>5</sup>

The grammar principle (3d), which is also from Pollard & Yoo (1998: 421), is a case of a principle with a complex description in the antecedent, unlike (3a)–(3c), in which the antecedent consists of a sort symbol. Any kind of description may serve as antecedent of a grammar principle.

An HPSG grammar is a signature consisting of a sort hierarchy, feature appropriateness declarations and relation symbols, together with a set of grammar principles. The meaning of the grammar is given by a class of structures (linguistic objects) which obey the structural restrictions of the signature and are completely well-formed with respect to the grammar principles. The nature of the *linguistic objects* and how the relevant models of an HPSG grammar should be conceived of has been subject to intense discussion. Pollard & Sag (1994: 8–9) think of them as *types* and want to construct them as a set of totally well-typed and sort-resolved abstract feature structures. Each such type is supposed to correspond to the set of token occurrences of the same utterance. For example, in this view, the English utterance *Breakfast is ready*, which may occur as a concrete utterance token at different places and at different times, always belongs to the unique type *Breakfast is ready*, rendered as an abstract feature structure licensed by an HPSG grammar of English.

All HPSG model theories after Pollard & Sag (1994) give up the idea of postulating types as objects in the intended grammar model and do not construct models which are populated with feature structures.<sup>6</sup> King (1999) suggests *ex-*

<sup>5</sup>One additional interesting property of this principle concerns the set designated by tag <sup>4</sup> . Structures described by the consequent of the principle do not necessarily contain an attribute with the set value 4 . However, the list 3 and the sets 1 and 2 are all attribute values which are restricted in (3c) by reference to set 4 . Such constellations motivate the introduction of *chains* in the description language. Chains model lists (or sets) of objects that are not themselves attribute values, but whose members are (see Section 3 for the syntax and Section 4 for the semantics of chains). 4 is best described as a chain.

<sup>6</sup>Pollard (1999: 294) still uses the term *feature structure*, but it is applied to a special kind of *interpretation* in the sense of Definition 7. See the more detailed characterization of these structures in Section 6 below.

#### 3 Formal background

*haustive models*, collections of possible language tokens. Whereas two types are always distinct, linguistic tokens in exhaustive models can be isomorphic when they are different token occurrences of the same utterance. Pollard (1999) rejects the idea that models contain possible tokens and essentially uses a variant of King's exhaustive models for constructing sets of unique mathematical idealizations of linguistic utterances: any well-formed utterance finds its structurally isomorphic unique counterpart in this model, called the *strong generative capacity* of the grammar. The relationship between the elements of the strong generative capacity and empirical linguistic events is much tighter than it is for Pollard and Sag's object types: for the former, it is a relationship of structural isomorphism, for the latter it is only a conventional notion of correspondence. Moreover, Pollard's models avoid an ontological commitment to the reality of types. Richter (2007) points out shortcomings with the postulated one-to-one correspondence between linguistic types (Pollard & Sag 1994) or mathematical idealizations (Pollard 1999) and the groups of linguistically indistinguishable utterances they are supposed to represent (e.g. the group of realizations of *Breakfast is ready*). The failure of achieving the intended one-to-one correspondence is due to technical properties of the structure of the respective models and to imprecisions of actual HPSG grammar specifications, and the two factors are partially independent. Richter (2007) suggests schematic amendments to grammars (by a small set of axioms and an extended signature), leading to *normal form grammars* whose *minimal exhaustive models* exhibit the intended one-to-one correspondence between structural configurations in the model and (groups of linguistically indistinguishable) empirically observable utterance events. Despite being a certain kind of exhaustive model, minimal exhaustive models are not token models and do not suffer from the problematic concept of *potential token* models which is characteristic of King's approach.

HPSG as a model-theoretic grammar framework provides linguists with an expressive class of logical description languages. Their semantics makes it possible to investigate closely the predictions of a given set of grammar principles and the internal and mutual consistency of different modules of grammar. At a more foundational level, HPSG is exceptional with its alternative characterizations of the meaning of grammars based on one and the same set of core definitions of the syntax and semantics of its descriptive devices. This common core in the service of philosophically different approaches to the scientific description of human languages makes their respective advantages and disadvantages comparable within one single framework, and it renders the discussion of very abstract concepts from the philosophy of science unusually concrete. Alternative approaches to

### Frank Richter

grammatical meaning based on different views of the nature of scientific description of an empirical domain can be investigated and compared with a degree of detail that is hardly achieved elsewhere in linguistics.

The structure of the remainder of this chapter is as follows: Section 3 turns to the syntax of RSRL, defines signatures with sort hierarchies and feature appropriateness for the non-logical vocabulary, and introduces terms and formulæ as expressions. A subclass of formulæ is called descriptions and corresponds to the informal AVMs augmented with logical connectives and relational expressions which we saw above in (1)–(3). Section 4 furnishes the syntactic expressions with a semantics similar to what is familiar from classical logic, except that formulæ and descriptions denote sets of objects rather than truth values. Section 5 turns to the meaning of grammars, taking King's exhaustive models as a concrete example of the four explications outlined above, since it is technically the easiest to define. The final section (Section 6) outlines how the other three approaches to the meaning of HPSG grammars differ from King's possible token models without fully defining all constructs they involve.

The function of Sections 3–6 is thus to spell out in more depth what the present section summarized in much broader strokes. Readers who do not wish to pursue HPSG's formal foundations further can stop here without missing anything fundamentally new.

# **3 Signatures and descriptions**

As logical theories of entities in a domain of objects, HPSG grammars consist of two main components. First, a logical signature, which provides the symbols for describing the domain of interest, in this case a natural language. And second, an exact delineation of all and only the legitimate entities in the denotation of the grammar, written as a collection of statements about their configuration. These statements are descriptions within a logical language and are composed from logical constants, variables, quantifiers, brackets and the symbols provided by the signature. They are variously known to linguists as principles of grammar, constraints, or rules. In the following, I will use the term *principles* to designate these statements. Linguists often use abbreviatory conventions for conceptually distinguished groups of principles, such as grammar rules, lexical entries, or lexical rules. From a logical perspective, then, a grammar is a pair consisting of a signature and a collection of principles. The appendix of Pollard & Sag (1994) provides an early example in HPSG of this conception.

#### 3 Formal background

Signatures in HPSG go beyond supplying non-logical symbols for descriptions, they impose additional restrictions on the organization of the non-logical symbols. These restrictions ultimately have an effect on how the domain of described objects is structured. Let us first investigate the two most prominent sets of nonlogical symbols: sorts and attributes. The set of *sort* symbols is arranged in a *sort hierarchy*, and that sort hierarchy is in turn connected to the set of *attribute* symbols (also known as*features*). The sort hierarchy is a partial order,<sup>7</sup> and attributes are declared *appropriate to* sorts in the sort hierarchy. This appropriateness declaration must not be entirely random: if an attribute is declared appropriate to some sort, it must also be declared appropriate to all its subsorts. This requirement is known as *feature inheritance*. <sup>8</sup> Moreover, for each sort and attribute such that is appropriate to , some other sort 0 is *appropriate for* at . In other words, a certain attribute value ( 0 ) is declared appropriate for at . These attribute values must not be completely random either: for any subsort of , an appropriate feature of is of course also appropriate to that subsort (by feature inheritance), but in addition, the value of at that subsort must be at least as specific as it is at . This means the value is either <sup>0</sup> or a subsort thereof. It may not be less specific, or, to put it differently, it may not be a supersort of 0 .

Some sorts in the sort hierarchy enjoy a special status by being *maximally specific*. They are called *species*. Species are sorts without proper subsorts. Sorts that are maximally specific and lack any appropriate attribute receive a special name and are called *atomic* sorts or simply *atoms*.

In addition to sorts and attributes, a signature provides relation symbols. Wellknown examples are a ternary append relation and a binary member relation, but grammars may also require relations such as (often ternary) shuffle and binary o-command. Each relation symbol comes with a positive natural number for the number of arguments, its *arity*.

Putting all of this together, we obtain a definition of signatures as a septuple with sort hierarchy h*,* vi, species , attributes , and relation symbols ; the function handles the feature appropriateness and function is for the number of arguments of each relation.

<sup>7</sup>A partial order is given by a set whose elements stand in a reflexive, antisymmetric and transitive ordering relation.

<sup>8</sup>See Figure 1 and its explanation in Section 2 for an example which also points out the subtle distinction between the use of the term *appropriate to* (feature to sort) vs. the term *appropriate for* (sort value for a feature at a given sort).

### Frank Richter

**Definition 1** Σ *is a* signature *iff* Σ *is a septuple* h*,* v*, , , , ,* i*,* h*,* vi *is a partial order,* = { ∈ | *for each* <sup>0</sup> ∈ *, if* <sup>0</sup> v *then* = 0 }*, is a set, is a partial function from* × *to , for each* <sup>1</sup> ∈ *, for each* <sup>2</sup> ∈ *, for each* ∈ *, if* (<sup>1</sup> *,* ) *is defined and* <sup>2</sup> v <sup>1</sup> *then* (2*,* ) *is defined and* (2*,* ) v (<sup>1</sup> *,* )*, is a finite set, and is a total function from to the positive integers.*

The partial order h*,* vi is the sort hierarchy, and the set of sorts , just like the set of attributes , can in principle be infinite. In actual grammars it is finite, and in HPSG grammars it is also assumed that contains a top element, which is a sort that subsumes all other sorts in the sort hierarchy. is the set of maximally specific sorts, which will play a prominent role in the semantics of descriptions. is a function for fixing the appropriateness conditions on attributes and attribute values, and the conditions on that function reflect HPSG's restrictions on feature declarations. is called the *(feature) appropriateness function*. The last two lines of the definition provide the set of relation symbols, , with their arity, . Relations are at least unary.

Relations in HPSG often express relationships between lists (append, shuffle) or sets (union, intersection). Lists are usually encoded in HPSG with attributes FIRST and REST, and sorts *list*, *elist* (for empty list) and *nelist* (for non-empty list), but of course the exact naming does not matter. A fragment of the sort hierarchy which declares the sorts and attributes for regular lists is shown in Figure 2.

Figure 2: Fragment of a sort hierarchy for encoding lists

An AVM description of a list with two *synsem* objects can then be notated as in example (4a). Of course, grammar writers usually abbreviate list descriptions in AVMs by a syntax with angled brackets for superior readability, as shown

#### 3 Formal background

in (4b), a more transparent rendering of (4a), but that is just a convention that presupposes the existence of a sort hierarchy fragment like in Figure 2.

$$\begin{array}{ll} \text{(4)} & \text{a.} \begin{bmatrix} \textit{nelist} \\ \textit{FIRST} \textit{ synsem} \\ \textit{RES} \\ \textit{RES} \end{bmatrix} \begin{bmatrix} \textit{nelist} \\ \textit{FIRST} \textit{ synsem} \\ \textit{RES} \textit{ elist} \end{bmatrix} \\ \text{b.} & \left\langle \left[ \textit{synsem} \right], \left[ \textit{synsem} \right] \rangle \end{array}$$

In combination with relations, grammarians occasionally require a more generalized use of lists (and sets) than their basic encoding above supports. Starting already with Pollard & Sag (1994), we find structures in arguments of relations which behave like regular lists or sets, except that they do not occur as attribute values anywhere in the structures in which the relations are supposed to hold.<sup>9</sup> In order to account for these applications of lists and sets in arguments of relations, RSRL introduces *chains*. Chains are handled with dedicated sorts and attributes with a fixed interpretation that extend every signature. They can be thought of as a more flexible treatment of lists alongside their regular explicit encoding in HPSG.

RSRL adds chains to all signatures. Informally, the extra symbols act very much like sorts and attributes for lists: *chain* for *list*, *echain* and *nechain* for *elist* and *nelist*, respectively, and the reserved symbols † and *⊲* for FIRST and REST. In order to integrate the reserved new sort symbols with any signature a linguist might specify, a distinguished sort *metatop* serves as unique top element of the extended sort hierarchy. The extensions are defined for any signature by adding reserved *pseudo-sorts* and *pseudo-attributes* and structuring the expanded sort hierarchy in the desired way:

**Definition 2** *For each signature* Σ = h*,* v*, , , , ,* i*,* b<sup>=</sup> <sup>∪</sup> {*ℎ, ℎ, ℎ,*}*,* <sup>b</sup><sup>v</sup> <sup>=</sup> v ∪ {h*ℎ, ℎ*i*,* <sup>h</sup>*ℎ, ℎ*i} <sup>∪</sup> n <sup>h</sup>*,* <sup>i</sup> <sup>|</sup> <sup>∈</sup> b\ o ∪ n <sup>h</sup>*,*<sup>i</sup> <sup>|</sup> <sup>∈</sup> <sup>b</sup> o *,* b <sup>=</sup> <sup>∪</sup> {*ℎ, ℎ*}*, and* b<sup>=</sup> <sup>∪</sup> {†*, <sup>⊲</sup>*}*.*

<sup>9</sup>See (3c) above for an example in the second argument of a binary relation set-of-elements.

### Frank Richter

The extended sort hierarchy relation, bv, simply integrates the new pseudosorts into the given relation by ordering *echain* and *nechain* under *chain*, keeping the reflexive closure intact and ordering every sort and pseudo-sort under the new top element of the partial order, *metatop*. Corresponding to *elist* and *nelist* above, *echain* and *nechain* are treated as maximally specific by including them in the extension of , designated as b . <sup>10</sup> An AVM describing a chain with two *synsem* objects corresponding to the description of a list with two *synsem* objects in (4a) now appears as follows:

$$(5)\quad \begin{bmatrix} \textit{nechain} \\ \dagger \textit{symsem} \\ \bullet \begin{bmatrix} \textit{nechain} \\ \dagger \textit{symsem} \\ \bullet \textit{echain} \end{bmatrix} \end{bmatrix}$$

Apart from the non-logical constants from (expanded) signatures and some logical symbols, a countably infinite set of variables is needed, which will be symbolized by . Lower-case letters from the Latin alphabet serve as variable symbols, typically .

For expository reasons, the syntax of descriptions, to be introduced next, does not employ AVMs, the common lingua franca of constraint-based grammar formalisms. The reasons are twofold: most importantly, although AVMs provide an extremely readable and flexible notation, they are quite cumbersome to define as a rigorous logical language which meets all the expressive needs of HPSG. Some of this awkwardness in explicit definitions derives from the very flexibility and redundancy in notation that makes AVMs perfect for everyday linguistic practice. Second, the original syntax of RSRL is, by contrast, easy to define, and, as long as it is not used for descriptions as complex as they occur in real grammars, its expressions are still transparent for everyone who is familiar with AVMs. Readers who want to explore how our description syntax relates to a formal syntax of AVMs are referred to Richter (2004) for details and a correspondence proof.

The definition of the syntax of descriptions proceeds in two steps, quite similar to first-order predicate logic. I will first introduce terms and then build formulæ and descriptions from terms. Terms are essentially what is known to linguists as *paths*, sequences of attributes:

<sup>10</sup>Extending the appropriateness function, , is unnecessary since the relevant effects follow immediately from the semantics of the new reserved symbols in Definition 8.

#### 3 Formal background

**Definition 3** *For each signature* Σ = h*,* v*, , , , ,* i*,* Σ *is the smallest set such that* : ∈ Σ *, for each* ∈ *,* ∈ Σ *, for each* <sup>∈</sup> b*and each* <sup>∈</sup> Σ *,* ∈ Σ *.*

Simply put, sequences of attributes (including the two pseudo-attributes † and *⊲*) starting either with the colon or a single variable are Σ terms. Equipped with terms, we can immediately proceed to formulæ, the penultimate step on the way to descriptions. There are three kinds of simple formulæ: formulæ that assign a sort to the value of a path, formulæ which state that two paths have the same value (*structure sharing*, in linguistic terminology), and relational formulæ. Complex formulæ can be built from these by existential and universal quantification, negation, and the classical binary logical connectives.

**Definition 4** *For each signature* Σ = h*,* v*, , , , ,* i*,* Σ *is the smallest set such that for each* <sup>∈</sup> b*, for each* <sup>∈</sup> Σ *,* ∼ ∈ Σ *, for each* <sup>1</sup> *,* <sup>2</sup> ∈ Σ *,* <sup>1</sup> ≈ <sup>2</sup> ∈ Σ *, for each* ∈ *, for each* <sup>1</sup> *, . . . ,*  () ∈ *,* (<sup>1</sup> *, . . . ,*  ()) ∈ Σ *, for each* ∈ *, for each* ∈ Σ *,* ∃ ∈ Σ *, (analogous for* ∀*) for each* ∈ Σ *,* ¬ ∈ Σ *, for each* <sup>1</sup> *,* <sup>2</sup> ∈ Σ *, and* (<sup>1</sup> ∧ 2) ∈ Σ *. (analogous for* ∨*,* →*,* ↔*)*

In this syntax, the Head Feature Principle of (3a) can be rendered as in (6a) or, equivalently, as in (6b).<sup>11</sup>

	- b. (: ∼ *headed-phrase*) → ∃ (: SYNSEM LOCAL CATEGORY HEAD ≈ ∧ : HEAD-DTR SYNSEM LOCAL CATEGORY HEAD ≈ )

Finally, is a function that determines for every Σ term and Σ formula the set of variables that occur free in them.

**Definition 5** *For each signature* Σ = h*,* v*, , , , ,* i*,* (:) = {}*,*

<sup>11</sup>The brackets in the antecedent are for readability.

### Frank Richter

*for each* ∈ *,* () = {}*, for each* ∈ Σ *, for each* <sup>∈</sup> *,*  <sup>b</sup> () <sup>=</sup> ()*, for each* ∈ Σ *, for each* <sup>∈</sup> *,*  <sup>b</sup> ( <sup>∼</sup> ) <sup>=</sup> ()*, for each* <sup>1</sup> *,* <sup>2</sup> ∈ Σ *,* (<sup>1</sup> ≈ 2) = (1) ∪ (2)*, for each* ∈ *, for each* <sup>1</sup> *, . . . ,*  () ∈ *,* ( (<sup>1</sup> *, . . . ,*  ())) = 1 *, . . . ,*  () *, for each* ∈ Σ *, for each* ∈ *,* (∃) = ()\{}*, (analogous for* ∀*) for each* ∈ Σ *,* (¬) = ()*, for each* <sup>1</sup> *,* <sup>2</sup> ∈ Σ *,* ( (<sup>1</sup> ∧ 2)) = (1) ∪ (2)*. (analogous for* ∨*,* →*,* ↔*)*

Informally, an occurrence of a variable is free in a Σ term or a Σ formula if it is not bound by a quantifier. Σ formulæ without free occurrences of variables are a kind of formula of special interest, and the term Σ *description* is reserved for them:

**Definition 6** *For each signature* Σ*,* Σ 0 = ∈ Σ | () = {} *.*

 Σ 0 is the set of Σ descriptions. When a signature is fixed by the context, or when the exact signature is irrelevant in the discussion, we can simply speak of *descriptions* instead of Σ descriptions. Descriptions are the syntactic units that linguists use in grammar writing. (6a) and (6b) are descriptions. Grammars, as we will see in Section 5, are written by declaring a signature and stating a set of descriptions. But before grammars and their meaning can be investigated, the meaning of signatures and of descriptions must be explained.

# **4 Meaning of signatures and descriptions**

Descriptions of RSRL are interpreted similarly to expressions of classical logics such as first order logic, except that they are not evaluated as true or false in a given structure; instead, they denote collections of structures.

Defining the meaning of descriptions begins with delineating the structures which interpret signatures. In particular, species and attributes must receive a meaning, which should be tied to the HPSG-specific intentions behind sort hierarchies and feature declarations; and so must relation symbols, whose interpretation should heed their arity. Due to some extra restrictions which will ultimately be imposed on the interpretation of relation symbols (to meet intuitions of grammarians) and whose formulation presupposes a notion of term interpretation, I start with *initial interpretations*. They will be refined in a second step to full interpretations (Definition 13).

#### 3 Formal background

Some additional notation is convenient in the upcoming definition of initial interpretations. If is a set, ∗ is the set of all finite sequences (or -tuples) of elements of . + is the same set without the empty sequence. is short for the set ∪ ∗ . Initial interpretations employ a set U of entities which form the domain of grammars. The functions S, A and R interpret sort symbols, attribute symbols and relation symbols in that domain, respecting certain general restrictions which come with HPSG's ontological assumptions about languages. In particular, the behavior of attribute interpretation is tied to the feature appropriateness conditions, i.e. feature inheritance in the sort hierarchy.

**Definition 7** *For each signature* Σ = h*,* v*, , , , ,* i*,* <sup>I</sup> *is an* initial Σ interpretation *iff* <sup>I</sup> = hU*,* <sup>S</sup>*,* <sup>A</sup>*,* <sup>R</sup>i*,* U *is a set,* <sup>S</sup> *is a total function from* <sup>U</sup> *to ,* <sup>A</sup> *is a total function from to the set of partial functions from* <sup>U</sup> *to* <sup>U</sup>*, for each* ∈ *and each* ∈ <sup>U</sup> *if* <sup>A</sup>() () *is defined then* (S()*,* ) *is defined, and* <sup>S</sup>(A() ()) v (S()*,* )*, and for each* ∈ *and each* ∈ <sup>U</sup>*, if* (S()*,* ) *is defined then* <sup>A</sup>() () *is defined,* <sup>R</sup> *is a total function from to the power set of* Ð ∈N U *, and*

*for each* ∈ *,* <sup>R</sup>() ⊆ <sup>U</sup> () *.*

Initial Σ interpretations are quadruples consisting of four components. The first three of them will remain unchanged in full Σ interpretations (Definition 13). The elements of U are entities which populate the universe of structures. Their ontological status has been debated fiercely in HPSG, and will be discussed in Sections 5 and 6. For the moment, assume that they are either linguistic objects or appropriate abstractions thereof. S assigns each object in the universe a species, which is another way of saying that each object is of exactly one maximally specific sort. This is what is known as the property of being *sort-resolved*. The attribute interpretation function A interprets each attribute symbol as a (partial) function that assigns an object of the universe to an object of the universe, and as such it obeys the restrictions of the feature declarations of the signature, embodied in the function : attributes are defined on all and only those objects <sup>1</sup> which have a species to which the attributes are appropriate according to ; and the object which <sup>1</sup> is mapped to by the attribute must in turn be of a species

### Frank Richter

which is appropriate for the attribute (at the species of <sup>1</sup> ). This is what is known as the property of interpreting structures as being *totally well-typed*. Originally both of these properties of interpreting structures were formulated with respect to so-called *feature structures*, but, as we will see below, this conception of interpreting structures for grammars was soon given up for philosophical reasons.<sup>12</sup> The relation interpretation function <sup>R</sup> finally interprets -ary relation symbols as sets of -tuples of objects. However, there is an additional option, which makes the definition look more complex: an object in an -tuple may in fact not be an atomic object; it can alternatively be a tuple of objects itself. These tuples in argument positions of relations will be described as *chains* with the pseudo-sorts and pseudo-attributes, which were added to signatures in Definition 2 above. As pointed out there, chains are a construct which gives grammarians the flexibility to use (finite) lists in all the ways in which they are put in relations in actual HPSG grammars (see (3c) for an example).

Since chains are provided by an extension of the set of sort symbols and attributes (Definition 2), the interpretation of the additional symbols must be defined separately. This is very simple, since these symbols behave essentially analogously to the conventional sort and attribute symbols of HPSG's list encoding.

**Definition 8** *For each signature* Σ = h*,* v*, , , , ,* i*, for each initial* Σ *interpretation* <sup>I</sup> = hU*,* <sup>S</sup>*,* <sup>A</sup>*,* <sup>R</sup>i*,*

<sup>b</sup><sup>S</sup> *is the total function from* <sup>U</sup> *to* b*such that*

*for each* <sup>∈</sup> <sup>U</sup>*,* <sup>b</sup><sup>S</sup> () <sup>=</sup> <sup>S</sup> ()*,*

*for each* <sup>1</sup> *, . . . ,*  <sup>∈</sup> <sup>U</sup>*,* <sup>b</sup><sup>S</sup> (h<sup>1</sup> *, . . . ,* i) = *ℎ if* = 0*, ℎ if >* 0 *, and*

<sup>b</sup><sup>A</sup> *is the total function from* b*to the set of partial functions from* <sup>U</sup> *to* <sup>U</sup> *such that for each* <sup>∈</sup> *,* <sup>b</sup><sup>A</sup> () <sup>=</sup> <sup>A</sup> ()*,*


b<sup>S</sup> is the *expanded species assignment function*, and b<sup>A</sup> is the *expanded attribute interpretation function*. The pseudo-species symbols *echain* and *nechain* label empty chains and non-empty chains, respectively. Given a non-empty chain, the pseudo-attribute † picks out its first member, corresponding to the function of the FIRST attribute on non-empty lists. Conversely, *⊲* cuts off the first element of

<sup>12</sup>Of course, the informal term *feature structure* is still alive among linguists, and in a technical sense, feature structures are essential constructs for implementation platforms.

#### 3 Formal background

a non-empty chain and returns the remainder of the chain, as does the standard attribute REST for lists.

In addition to attributes, terms may also contain variables (Definition 3). Term interpretation thus requires a notion of *variable assignments* in (initial) interpretations.

**Definition 9** *For each signature* Σ*, for each initial* Σ *interpretation* <sup>I</sup> = hU*,* <sup>S</sup>*,* <sup>A</sup>*,* <sup>R</sup>i*,* G<sup>I</sup> = U *is the set of variable assignments in* I*.*

An element of G<sup>I</sup> (the set of total functions from the set of variables to the set of objects and chains of objects of <sup>U</sup>) will be notated as , following a convention frequently observed in predicate logic. With variable assignments in (initial) interpretations, variables denote objects in the universe U and chains of objects of the universe.

Terms map objects of the universe to objects (or chains of objects) of the universe as determined by a term interpretation function T I :

**Definition 10** *For each signature* Σ = h*,* v*, , , , ,* i*, for each initial* Σ *interpretation* <sup>I</sup> = hU*,* <sup>S</sup>*,* <sup>A</sup>*,* <sup>R</sup>i*, for each* ∈ <sup>G</sup><sup>I</sup> *,* T I *is the total function from* Σ *to the set of partial functions from* <sup>U</sup> *to* <sup>U</sup> *such that for each* ∈ <sup>U</sup>*,*

T I (:) () *is defined and* <sup>T</sup> I (:) () = *,*

*for each* ∈ *,* <sup>T</sup> I () () *is defined and* <sup>T</sup> I () () = ()*,*

*for each* ∈ Σ *, for each* <sup>∈</sup> b*,*

T I () () *is defined iff* <sup>T</sup> I () () *is defined and* <sup>b</sup>A() (<sup>T</sup> I () ()) *is defined, and*

*if* T I () () *is defined then* <sup>T</sup> I () () <sup>=</sup> <sup>b</sup>A() (<sup>T</sup> I () ())*.*

T I is called the *term interpretation function under* <sup>I</sup> *under* . Σ terms either start with a variable or with the special symbol colon (':'). The colon denotes the identity function. Interpreted on any object, it returns that object. If a term starts with the colon, its term interpretation starts, so to speak, at the object to which it is applied (T I () ()) and, if each subsequent attribute in is defined on the object to which the interpretation of the earlier attribute(s) took us, the term interpretation will yield the object reached by the last attribute. When a Σ term starts with a variable , the given variable assignment will determine the starting point of interpreting the sequence of attributes (()). Of course, variables may be assigned chains of objects, in which case the symbols of the expanded attribute set can be used to navigate the elements of the chain.

The set of objects which are reachable from a single given object in an interpretation by following sequences of attribute interpretations is important for

### Frank Richter

the way in which quantification is conceived of by grammarians. It also plays a role in thinking about which objects can in principle stand in a relation, and it is crucial for explicating different notions of the meaning of grammars. Definition 11 captures this notion, the set of components of an object in an (initial) interpretation. Note that all terms in Definition 11 start with the colon.

**Definition 11** *For each signature* Σ = h*,* v*, , , , ,* i*, for each initial* Σ *interpretation* <sup>I</sup> = hU*,* <sup>S</sup>*,* <sup>A</sup>*,* <sup>R</sup>i*, for each* ∈ <sup>U</sup>*,*

*.*

$$\mathsf{C}\_{\mathsf{l}}^{u} = \left\{ u' \in \mathsf{U} \left| \begin{array}{l} for \ some \ g \in \mathsf{G}\_{\mathsf{l}}, \\ for \ some \ \pi \in A^{\*}, \\ \mathsf{T}\_{\mathsf{l}}^{g}(:\pi)(u) \text{ is defined, and} \\ \ u' = \mathsf{T}\_{\mathsf{l}}^{g}(:\pi)(u) \end{array} \right. \right\}$$

C I is the set of components of in <sup>I</sup>. The purpose of <sup>C</sup> I is to capture the set of all objects that are reachable from some object in the universe by following a path of interpreted attributes. Thinking of these configurations as directed graphs, the set of components of in <sup>I</sup> is the set of nodes that can be reached by following any sequence of vertices (in the direction of attribute interpretation) starting from . This corresponds to how linguists normally conceive of the substructures of some structured object.<sup>13</sup> The set of components of objects is used in two ways in the definitions of full interpretations and description denotations: it restricts the set of objects that are permitted in relations, and it provides the domain of quantification in quantificational expressions of the logical language.

According to Definition 7 of initial interpretations, relation symbols are simply interpreted as tuples of objects (and chains of objects) in the universe of interpretation. However, HPSGians have a slightly more restricted notion of relations: for them, relations hold between objects that occur within a sign (or a similar kind of larger linguistic structure); they are not relations between objects that occur in separate (unconnected) signs. The following notion of *possible relation tuples in an interpretation* captures this intuition.

**Definition 12** *For each signature* Σ = h*,* v*, , , , ,* i*, for each initial* Σ *interpretation* <sup>I</sup> = hU*,* <sup>S</sup>*,* <sup>A</sup>*,* <sup>R</sup>i*,*

$$\mathsf{RT}\_{\mathsf{l}} = \bigcup\_{n \in \mathbb{N}} \left\{ \langle u\_1, \dots, u\_n \rangle \in \overline{\mathbb{U}}^n \, \middle| \, \begin{aligned} \left( \begin{aligned} \mathsf{X} & \text{---} \\ \left\langle u\_1, \dots, u\_n \right\rangle \in \overline{\mathbb{U}}^n \, \middle| \, \begin{aligned} & \text{for some } u \in \mathsf{U}, \\ & \text{for each } i \in \mathbb{N}, 1 \le i \le n \\ & \, u\_i \in \overline{\mathbb{C}\_1^{\mu}} \end{aligned} \right) . \end{aligned} \right\}.$$

<sup>13</sup>Phrasing this more carefully, the object itself is not structured, but there is a structure generated by the object by following the vertices, or more technically, by the composition of functions which interpret attribute symbols.

RT<sup>I</sup> is the set of possible relation tuples in I. Possible relation tuples in an initial interpretation are characterized by the existence of some object in the interpretation from which each object in a relation tuple can be reached by a sequence of attribute interpretations. In case an argument in a tuple is a chain, then the objects on the chain are thus restricted.

The notion of *full interpretations* integrates the restriction on possible relations, keeping everything else unchanged from initial interpretations:

**Definition 13** *For each signature* Σ = h*,* v*, , , , ,* i*, for each initial* Σ *interpretation* I <sup>0</sup> = h<sup>U</sup> 0 *,* S 0 *,* A 0 *,* R 0 i*, for the set of possible relation tuples in* I 0 *,* RT<sup>I</sup> 0*,* <sup>I</sup> = hU*,* <sup>S</sup>*,* <sup>A</sup>*,* <sup>R</sup>i *is a full* Σ *interpretation iff* 0 0 0 *, and* <sup>R</sup> *is a total function from to the power set of* RT<sup>I</sup> <sup>0</sup>*, and*

U = U *,* S = S *,* A = A *for each* <sup>∈</sup> *,* <sup>R</sup>() ⊆ RT<sup>I</sup> <sup>0</sup> ∩ U () *.*

It can be checked that variable assignments in initial interpretations and sets of components of objects in initial interpretations are the same as in corresponding full interpretations with the same universe, species interpretation and attribute interpretation functions, since variable assignments and sets of components of objects do not depend on the interpretation of relations. From now on, all of the above will be used with respect to full interpretations, and full interpretations will simply be called interpretations.

Everything is now ready to define the meaning of formulæ in interpretations as sets of objects in an interpretation. A sort assignment formula constructed from a term, a reserved assignment symbol and a sort symbol such as :CASE∼*nominative* denotes the set of objects in the interpretation on which the CASE attribute is defined and, when interpreted on them, leads to an object of sort *nominative*; and the path equation :SYNSEM LOCAL CATEGORY HEAD ≈ :HEAD-DTR SYNSEM LOCAL CATEGORY HEAD denotes those objects on which the two given paths are defined and lead to the same object. Relational formulæ, the third kind of atomic formula, also denote sets of objects and will be discussed in more detail below. Existential quantification and universal quantification are restricted to components of objects; and the logical connectives are treated with the familiar operations of set union (disjunction), set intersection (conjunction) and set complement (negation), or with combinations thereof (implication, bi-implication). The definition of Σ formula denotation for quantificational expressions needs a notation for modifying variable assignments with respect to the value of designated variables. For any variable assignment ∈ <sup>G</sup><sup>I</sup> , for <sup>0</sup> = [ ↦→ ], 0 is just like except that <sup>0</sup> maps variable to object (possibly a tuple).

Frank Richter

**Definition 14** *For each signature* Σ = h*,* v*, , , , ,* i*, for each (full)* Σ *interpretation* <sup>I</sup> = hU*,* <sup>S</sup>*,* <sup>A</sup>*,* <sup>R</sup>i*, for each* ∈ <sup>G</sup><sup>I</sup> *,* D I *is the total function from* Σ *to the power set of* U *such that*

*for each* ∈ Σ *, for each* <sup>∈</sup> b*,* D I ( ∼ ) = ∈ <sup>U</sup> T I () () *is defined, and* bS T I () () bv *, for each* <sup>1</sup> *,* <sup>2</sup> ∈ Σ *,* D I (<sup>1</sup> ≈ 2) = ∈ <sup>U</sup> T I (1) () *is defined,* T I (2) () *is defined, and* T I (1) () = <sup>T</sup> I (2) () *, for each* ∈ *, for each* <sup>1</sup> *, . . . ,*  () ∈ *,* D I (<sup>1</sup> *, . . . ,*  ()) = ∈ <sup>U</sup> (1)*, . . . ,* ( ()) ∈ <sup>R</sup>() *, for each* ∈ *, for each* ∈ Σ *,* D I (∃) = ( ∈ <sup>U</sup> *for some* <sup>0</sup> ∈ C I ∈ <sup>D</sup> [↦→ 0 ] I () ) *, for each* ∈ *, for each* ∈ Σ *,* D I (∀) = ( ∈ <sup>U</sup> *for each* <sup>0</sup> ∈ C I ∈ <sup>D</sup> [↦→ 0 ] I () ) *, for each* ∈ Σ *,* D I (¬) = <sup>U</sup>\<sup>D</sup> I ()*, for each* <sup>1</sup> *,* <sup>2</sup> ∈ Σ *,* D I ((<sup>1</sup> ∧ 2)) = <sup>D</sup> I (1) ∩ <sup>D</sup> I (2) *for each* <sup>1</sup> *,* <sup>2</sup> ∈ Σ *,* D I ((<sup>1</sup> ∨ 2)) = <sup>D</sup> I (1) ∪ <sup>D</sup> I (2) *for each* <sup>1</sup> *,* <sup>2</sup> ∈ Σ *,* D I ((<sup>1</sup> → 2)) = U\D I (1) ∪ D I (2)*, and for each* <sup>1</sup> *,* <sup>2</sup> ∈ Σ *,* D I ((<sup>1</sup> ↔ 2)) = ( (U\<sup>D</sup> I (1)) ∩ (U\<sup>D</sup> I (2))) ∪ (<sup>D</sup> I (1) ∩ <sup>D</sup> I (2))*.*

D I is the Σ *formula interpretation function with respect to* I *under a variable assignment, , in* <sup>I</sup>. Sort assignment formulæ, ∼ , denote sets of objects on which the attribute path is defined and leads to an object <sup>0</sup> of sort . If is not a species, the object <sup>0</sup> must be of a maximally specific subsort of . Path equations of the form <sup>1</sup> ≈ <sup>2</sup> hold of an object when path <sup>1</sup> and path <sup>2</sup> lead to the same object 0 . And an -ary relational formula (<sup>1</sup> *, . . . ,* ) denotes a set of objects such that the -tuples of objects (or chains of objects) assigned to the variables <sup>1</sup> to are in the denotation of the relation . This means that a relational formula either denotes the entire universe U or the empty set, depending on the variable assignment in <sup>I</sup>. For example, according to Definition 14, the formula append(<sup>1</sup> *,* 2*,* 3) denotes the universe of objects if the triple h(1)*,* (2)*,* (3)i is in R(append), or else the empty set. We will return to the meaning of relational formulæ after defining the meaning of grammars to confirm that this is a useful way to determine their denotation.

Negation is interpreted as set complement of the denotation of a formula, conjunction and disjunction of formulæ as set intersection and set union of the

#### 3 Formal background

denotations of two formulæ, respectively. The meaning of implication and biimplication follows the pattern of classical logic and could alternatively be defined on the basis of negation and disjunction (or conjunction) alone. Quantificational expressions are special in that they implement the idea of restricted quantification by referring to the set of components of objects in I. An existentially quantified formula, ∃, denotes the set of objects such that there is at least one component (or chain of components) <sup>0</sup> of, and interpreting as 0 leads to describing . With universal quantification, the corresponding condition must hold for *all* components (or chains of components) of the objects in the denotation of the quantified formula. Again turning to the application of these definitions of formula denotations in grammar writing, the intuition is that linguists quantify over the components of grammatical structures (sentences, phrases), and not over a universe of objects that may include unrelated sentences and grammatical structures, or components thereof: a certain kind of object exists within a given structure, or all objects in a certain structure fulfill certain conditions.

A standard proof shows that the denotation of Σ formulæ without free occurrences of variables, i.e. the denotation of Σ descriptions, is independent of the initial choice of variable assignment. For Σ descriptions, I can thus define a simpler Σ *description denotation function with respect to an interpretation* I, D<sup>I</sup> :

**Definition 15** *For each signature* Σ = h*,* v*, , , , ,* i*, for each (full)* Σ *interpretation* <sup>I</sup> = hU*,* <sup>S</sup>*,* <sup>A</sup>*,* <sup>R</sup>i*,* <sup>D</sup><sup>I</sup> *is the total function from* Σ 0 *to the power set of* U *such that* <sup>D</sup>I() = ∈ <sup>U</sup> *for each* ∈ <sup>G</sup><sup>I</sup> *,* ∈ <sup>D</sup> I () *.*

For each description , <sup>D</sup><sup>I</sup> returns the set of objects in the universe of <sup>I</sup> that are described by . With Σ descriptions and their denotation as sets of objects, everything is in place to symbolize all grammar principles of a grammar such as the one presented by Pollard & Sag (1994) in logical notation, and the grammar principles receive an interpretation along the lines informally characterized by Pollard and Sag. A comprehensive logical rendering of their grammar of English can be found in Appendix C of Richter (2004). It includes the treatments of (finite) sets and of parametric sorts (such as *list(synsem)*), which are not specifically addressed – but implicitly covered – in the preceding presentation. Moreover, as shown there, all syntactic constructs of the logical languages above are necessary to achieve that goal without reformulating the grammar.

# **5 Meaning of grammars**

Grammars comprise sets of descriptions, the principles of grammar. These sets of principles are often called *theories* in the context of logical languages for HPSG,

### Frank Richter

although this terminology can occasionally be confusing.<sup>14</sup> Theories, i.e. sets of descriptions, are symbolized with . A grammar is simply a theory together with a signature:

### **Definition 16** Γ *is a* grammar *iff* Γ *is a pair* hΣ*,* i*, where* Σ *is a signature, and* ⊆ Σ 0 *.*

Essentially, the denotation of a theory can be thought of as the denotation of the conjunction of the descriptions in the theory. The difference is that theories can, in principle (and contrary to deliberate linguistic convention), be infinite in the sense of containing infinitely many descriptions. Conjunctions of descriptions are finite, since conjunctive formulæ are finite.

**Definition 17** *For each signature* Σ*, for each* Σ *interpretation* <sup>I</sup> = hU*,* <sup>S</sup>*,* <sup>A</sup>*,* <sup>R</sup>i*,* Θ<sup>I</sup> *is the total function from the power set of* Σ 0 *to the power set of* U *such that for each* ⊆ Σ 0 *,*

ΘI() = ∈ <sup>U</sup> *for each* ∈ *,* ∈ <sup>D</sup>I() *.*

ΘI is the *theory denotation function with respect to* I. A theory consisting of a set of descriptions holds of every object in the universe exactly if every description in the theory holds of . In short, a theory denotes the set of objects that are described by everything in the theory. These objects do not violate any restriction that the theory expresses in one of its descriptions.

A first approximation to the meaning of grammars is provided by the notion of a Γ model, a model of a grammar Γ:

**Definition 18** *For each grammar* Γ = hΣ*,* i*, for each* Σ *interpretation* <sup>I</sup> = hU*,* <sup>S</sup>*,* <sup>A</sup>*,* <sup>R</sup>i*,* <sup>I</sup> *is a* Γ model *iff* ΘI() = <sup>U</sup>*.*

A Γ model is an interpretation <sup>I</sup> = hU*,* <sup>S</sup>*,* <sup>A</sup>*,* <sup>R</sup>i in which every description in the theory of grammar Γ describes every object in the interpretation's universe U. In other words, each object in the interpretation fulfills all conditions which are imposed by the grammar principles. There is no object in a Γ model that violates any principle.

Models of grammars are an appropriate starting point for revisiting the denotation of relational formulæ. Assume we want to define a unary relation synsem-rel which contains all objects of sort *synsem* of a typical HPSG grammar. To achieve this, we declare the relation symbol synsem-rel in the signature and we add the description in (7) to the theory of the grammar Γ:

<sup>14</sup>The problem with this term is that it can be argued that theories, defined this way, do not constitute what would traditionally be called a *theory of a language*, since many central aspects of a theory in the latter sense are not embodied in that kind of formalized theory.

3 Formal background

# (7) ∀ (synsem-rel() ↔ ∼ *synsem*)

Consider a non-empty Γ model I containing words and phrases. Since by assumption (7) is in the theory of Γ, and we consider a *model*, (7) describes every object in the universe of I. By the bi-implication, every component object of every object in <sup>I</sup> which is of sort *synsem* is in <sup>R</sup>(synsem-rel) (right to left), and every element of R(synsem-rel) is a *synsem* object (left to right). But if the bi-implication in (7) holds in both directions in I, it follows that the expression ∃ synsem-rel() describes every object in <sup>I</sup> which has a component that is in the synsem-rel relation. The expression ∀ synsem-rel() describes every object in I all of whose components are in the synsem-rel relation.<sup>15</sup>

Now assume we have a description much like (7) in our grammar theory, but instead of defining the meaning of synsem-rel, it defines the meaning of append: the new description says that for every object in a grammar model which contains three (not necessarily pairwise distinct) lists as components, the lists are in the ternary append relation as triple h<sup>1</sup> *,*2*,*3i iff <sup>3</sup> is the concatenation of <sup>1</sup> and <sup>2</sup> (in that order). Then we can use this append relation in yet another grammar principle as follows:<sup>16</sup>

#### (8) *head-filler-phrase* ⇒ ∃ 1 ∃ 2 ∃ 3 © « PHON 3 NON-HD-DTRS|FIRST|PHON 1 HEAD-DTR|PHON 2 ∧ append( <sup>1</sup>*,* <sup>2</sup> *,* <sup>3</sup> ) ª ® ¬

The filler daughter is the only non-head daughter in a head-filler phrase. In English, the phonology of the filler daughter precedes the phonology of the head daughter. According to (8), a head-filler phrase has three components, 1 , 2 , and 3 such that they are the list values of the PHON attributes of the non-head daughter, the head daughter and the phrase as a whole, and they are in the append relation (in the given order). But being in the append relation means that list 3 is the concatenation of list 1 and list 2 . Obviously, the denotation of relational formulæ works as intended in grammar models.

Linguists use grammars to make predictions about the grammatical structures of languages. In classical generative terminology, a grammar undergenerates if there are grammatical structures it does not capture. It overgenerates if it permits

<sup>15</sup>Of which there are none, given the usual structure of signs where *synsem* objects always have components of other sorts.

<sup>16</sup>In RSRL syntax, (8) can be written as

<sup>:</sup> ∼*head-filler-phrase* →

<sup>∃</sup><sup>1</sup> ∃<sup>2</sup> ∃<sup>3</sup>

<sup>(</sup>: PHON ≈ <sup>3</sup> ∧ : HEAD-DTR PHON ≈ <sup>2</sup> ∧ : NON-HD-DTRS FIRST PHON ≈ <sup>1</sup> ∧ append(<sup>1</sup> *,* 2*,* 3))

### Frank Richter

structures that are deemed ungrammatical. It is uncontroversial that an appropriate notion of the meaning of a grammar should support linguists in making such predictions with their grammars. However, the notion of Γ models in Definition 18 is not strong enough for this purpose. To see this, suppose there is a signature Σ which is fit to describe the entire English language, and there is a theory which expresses correctly all and only what there is to say about English. Interestingly, a hΣ*,* i model <sup>I</sup> of this perfect grammar of English can be arbitrarily small, as long as every object in the Σ interpretation I is described by every grammar principle in , as this is a condition on models of a grammar. Therefore a hΣ*,* i model of our perfect grammar may consist of nothing but a structure of the single sentence *Elon is going to Mars*. This follows from the definition of Γ models, because any appropriate grammar of English must describe all objects that together make up this well-formed sentence. But this one-sentence model of the grammar of English is obviously too small to count as a good candidate for the English language, because English contains much more than this single sentence. It follows that in arbitrarily chosen models, it cannot be detected if a grammar undergenerates or overgenerates.

King's (1999) *exhaustive models* are a possibility to define the meaning of grammars in such a way that the models reflect the basic expectations of generative linguists. The underlying intuition is to choose a maximal model which contains a congruent copy of any configuration of objects which can be found in some model of the grammar. This way, the model chosen for the meaning of a grammar is in a relevant sense big enough so that all the consequences of the grammar can be observed in it. If the grammar overgenerates, the model will contain ill-formed structures. If the grammar undergenerates, expected well-formed structures will be absent.

The simplest way to spell this out is by considering each and every alternative model I <sup>0</sup> of a grammar and observing that whenever you can describe something in an alternative model I <sup>0</sup> with an arbitrary set of descriptions, that set of descriptions also picks out something in the targeted, sufficiently large model I:

**Definition 19** *For each grammar* Γ = hΣ*,* i*, for each* Σ *interpretation* <sup>I</sup>*,*

I *is an* exhaustive Γ model *iff* I *is a* Γ *model, and for each* <sup>0</sup> ⊆ Σ 0 *, for each* Σ *interpretation* I 0 *, if* I 0 *is a* Γ *model and* Θ<sup>I</sup> 0 ( 0 ) ≠ ∅ *then* ΘI( 0 ) ≠ ∅*.*

Any grammar with a non-empty model also has a non-empty exhaustive model. In addition to being a model of a given grammar Γ = hΣ*,* i, an exhaustive Γ

#### 3 Formal background

model <sup>I</sup> has the property that each arbitrarily chosen set of descriptions <sup>0</sup> which denotes anything at all in any Γ model also denotes something in I. An alternative algebraic way to characterize this requirement is to say that any configuration of objects in any Γ model has a congruent counterpart in an exhaustive Γ model. At the same time, since an exhaustive model is from a special class of *models*, if a description in does not describe some object in a Γ interpretation <sup>I</sup> 0 , then this object in I 0 cannot have a counterpart in an exhaustive Γ model.

This is sufficient to capture relevant grammar-theoretic notions of linguistics: a grammar Γ of a language L overgenerates iff an exhaustive Γ model contains configurations that are not (congruent to) grammatical expressions in L; it undergenerates iff an exhaustive Γ model does not contain configurations which are (congruent to) grammatical expressions in L.

# **6 Alternative conceptions of the meaning of grammars**

Section 2 gave an informal overview of four different ways to conceive of models which explain the meaning of HPSG grammars: Theory T1 of Pollard & Sag (1994) views the adequate model as a collection of the object types of the expressions of the language L that a given grammar describes. T2 by King (1999) takes the intended model to be one from a class of models which contains all possible linguistic tokens of L. T3 (Pollard 1999) constructs the model Γ of language L as a collection of mathematical idealizations such that each grammatical structure of L should find a structurally isomorphic counterpart in the model. This model is called the *strong generative capacity* of grammar Γ. And T4 by Richter (2007) defines a schematic extension to grammars called their normal form which guarantees the existence of a model (a minimal exhaustive model) in which all and only the grammatical utterances of L find exactly one structurally matching configuration each, without commiting to the ontological status of the configurations in the model.

All four share the common core of aiming at capturing the predictions of a grammar in the sense of directly reflecting possible overgeneration or undergeneration (Section 5): all and only the grammatical structures of L are supposed to be in the intended model or to find a corresponding counterpart in it. The significant differences between T1, T2, T3 and T4 reside in their assumptions about the nature of the model. The decision of what kind of entities populate the model determines the ontological and structural properties of the entities in the model, which in turn leads to substantial technical differences in the construction of the models. The four theories T1–T4 are numbered chronologically in the order in which they were developed.

### Frank Richter

Deviating from chronological order, we begin with T2, the theory of exhaustive models (Definition 19). T2 has the distinguished property of insisting on a *token* model of the language L of a given grammar, hΣ*,* i. According to T2, actual well-formed linguistic tokens are the immediate object of grammatical description. They are the objects in the intended exhaustive model <sup>I</sup> = hU*,* <sup>S</sup>*,* <sup>A</sup>*,* <sup>R</sup>i. For any occurrence of an utterance of L in the real world, the intended exhaustive model contains the actual utterance itself. Since linguists cannot know how often an utterance of a concrete token in L did occur and will occur in the world, exhaustive models are a class of models. For T2 it does not matter how often the token utterance *Elon is going to Mars* is encountered at a concrete place and time in the world, because among the class of exhaustive models of English there is one with the correct number of occurrences for this utterance and all other actual utterances, and that exhaustive model is the intended one. However, there is a crucial complication: it is clear that most conceivable well-formed expressions of any given human language were never produced and never will be. Since, by construction, an exhaustive model must contain all potential well-formed expressions of a language which obey the principles of grammar, in addition to actual utterance tokens, the theory of exhaustive models must admit *potential tokens* in the intended exhaustive model for those utterances which never occur in the real world. If token models are already suspicious (or unacceptable) to many linguists, models comprising non-actual tokens are even more contentious.

T2 is designed in deliberate opposition to the chronologically preceding theory T1 of Pollard & Sag (1994), the only one which employs feature structures. T1 proposes that a grammar Γ = hΣ*,* i denotes a set of mathematical representations of *types* of linguistic events. The main idea is that the object types abstract away from individual circumstances of token occurrences, because for T1 a grammar of a language is assumed not to be concerned with individual linguistic events or tokens. The object types capture individual linguistic token events in the sense that an object type conventionally corresponds to "those imaginable linguistic objects that are actually predicted to be possible ones" (Pollard & Sag 1994: 7) in the language L that hΣ*,* i describes. The postulated intuitive correspondence is not explicated further, but it is expected that a trained linguist will recognize which object type a linguistic token encountered in the real world corresponds to. When observing a token expression of English in the world, for example in a situation in which someone exclaims *Elon is going to Mars!*, the linguist recognizes the corresponding object type. The informality of the relationship between the denotation of a grammar (mathematical objects serving as object types) and the domain of empirically measurable events (utterances of grammatical expres-

#### 3 Formal background

sions of a language) is one of the reasons to reject T1. In addition to the weak connection between the object types and the domain of empirically accessible data, object types have been criticized for being ontologically dubious and in any case superfluous and thus falling victim to Occam's razor. A theory of meaning without such an additional ontological postulate is deemed to be stronger.

T1 is implemented by constructing linguistic object types as abstract feature structures. In first approximation – to be refined presently – these can be thought of as rooted directed graphs, or, in terms of our previous grammar models, as configurations of objects under a root node. Definition 11 introduced C I as the set of components of an object in an interpretation <sup>I</sup>. The root node of the directed graph corresponds to the distinguished object in a set <sup>C</sup> I . The abstract feature structures used as mathematical representations of object types, however, are not graph-like objects, as two distinct graphs could be isomorphic, in violation of the core idea of proposing unique object types for classes of linguistic events. Abstract feature structures are therefore defined as (tuples of) sets, representing each node in the graph as an equivalence class of paths that lead to from the root node. A labeling function assigns sorts to these abstract nodes in accordance with the feature appropriateness function of the signature, and relations are basically tuples of abstract nodes. A satisfaction function determines what it means for a feature structure to satisfy a description, which is then elaborated in the notion of grammars admitting sets of abstract feature structures. In terms of the exhaustive models of T2, the abstract feature structures admitted by a grammar Γ can be imagined as a normal form representation with the abstract feature structures (the linguistic types) serving as the objects in a canonical exhaustive model I of Γ. <sup>17</sup> The earlier ontological criticism of T1 amounts to rejecting the insinuation that linguists consider (abstract) feature structures the subject of their grammars and affirming that their real interest lies in the description of languages. Assuming the existence of abstract feature structures is then a superfluous detour in the linguistic enterprise.

Meaning theory T3 is positioned against the theory T1 of object types for classes of theoretically indistinguishable linguistic tokens, and against the theory T2 of perceiving the meaning of a grammar in an intended exhaustive model populated with actual and non-actual linguistic tokens. With T3, Pollard (1999) is firmly opposed to token models and sees mathematical idealizations as fundamental to grammatical meaning. The concept of non-actual tokens is deemed

<sup>17</sup>This characterization is slightly simplistic; see Richter (2004: Appendix A, Definition 80) for details. Abstract feature structures are in fact extended to *canonical entities* to obtain canonical interpretations/models/exhaustive models.

### Frank Richter

unacceptable and self-contradictory. However, Pollard (1999) also rejects T1's ontological commitment to object types and wants to strengthen the relationship between the structures in the denotation of a grammar and empirically observable token expressions. According to T3, no two structures in the *strong generative capacity*, the collection denoted by a grammar hΣ*,* i of language L, are structurally isomorphic, and each utterance token of language L which is judged grammatical finds a structurally isomorphic counterpart in the grammar's strong generative capacity. An occurrence of the question *Is Elon really going to Mars?*, just like the occurrence of any other grammatical token of English, must find a unique structurally isomorphic mathematical idealization in the strong generative capacity of an adequate grammar of English. With this requirement, T3 tightens the connection between observables and the mathematical model, cutting out the types and establishing a much stricter link between the predictions of a grammar and the domain of empirical phenomena than the abstract feature structure models of Pollard & Sag (1994) offer with their appeal to conventional correspondence.

T3 is spelled out on the basis of models (Definition 18),<sup>18</sup> offering three alternative ways of characterizing the strong generative capacity of a grammar. The structures in Pollard's models can be understood as pairs of interpretations <sup>I</sup> = hU*,* <sup>S</sup>*,* <sup>A</sup>*,* <sup>R</sup>i and a root node whose set of components (<sup>C</sup> I ) constitute I's universe U. The objects in C I are all defined as canonical representations by a construction employing equivalence classes of attribute paths originating at the root node: given a grammar Γ, its strong generative capacity is the set of all such canonical representations whose interpretations are Γ models. By construction, they are all pairwise non-isomorphic, and with their internal (set-theoretic) structure, they can be assumed to be structurally isomorphic to grammatical utterance tokens of a language, in contrast to the abstract feature structures of Pollard & Sag (1994). The canonical representations in the strong generative capacity can be abstracted from each exhaustive model.

A central tenet of theories T1 and T3 of the meaning of grammars as sets of abstract feature structures and as mathematical idealizations in the strong generative capacity is the one-to-one correspondence either of object types or of mathematical idealizations to (linguistically indistinguishable groups of) grammatical utterances in a language. Richter (2007) investigates the models of existing HPSG grammars, such as the fragment of English developed in Pollard & Sag (1994), and notes that T1 and T3 necessarily trigger an unintended one-to-many relationship between grammatical utterances and structures in the denotation of

<sup>18</sup>Pollard (1999) is in fact based on Speciate Re-entrant Logic (SRL), King's precursor of RSRL, but a straightforward extension to full RSRL is provided in Richter (2004).

#### 3 Formal background

typical HPSG grammars: one token utterance leads to more than one structure in the grammar denotation. The main reason is that, in both theories, each structure which corresponds to a grammatical utterance entails the presence of a large number of further structures. For the strong generative capacity, the additional structures come from the substructural nodes in the mathematical idealization of an utterance which, by design, must in turn function as root nodes of admissible structures. But these additional structures are not mathematical idealizations of empirically observable grammatical utterances. In fact, many of the structures present in the strong generative capacity do not correspond to structures which can occur in grammatical utterances at all.<sup>19</sup> While the abstract feature structures of T1 do not have substructures, the abstract feature structure admission relation relies on a mechanism with exactly the same effect: admitting the unique type of *Elon must be on his way to Mars* entails the existence of many other types, so-called reducts of the intended type, and these reducts do not have empirical counterparts in linguistic utterance tokens.

In response to these problems, T4 proposes *normal form grammars*, schematic signature and theory extensions applicable to any HPSG grammar. The core idea behind the canonical grammar extension is to partition the denotation of grammars into utterances and to guarantee by construction that every *connected configuration of objects* in a grammar's denotation is isomorphic to an utterance token in a language. For T1 and T3, this extension is insufficient to establish the intended one-to-one correspondence between observable utterances and object types or mathematical idealizations, because the structures predicted by T1 and T3 still generate additional linguistic types or mathematical idealizations corresponding to each feature structure reduct or substructure, respectively. However, normal form grammars allow the definition of *minimal exhaustive models*, because normal form grammars can be shown to have exhaustive models which contain non-isomorphic connected configurations of objects with the special property that each of these configurations corresponds to a grammatical utterance. According to T4, *Elon must be on his way to Mars* corresponds to exactly one connected configuration in the minimal exhaustive model of a perfect grammar of English, and so does any other well-formed English utterance. Proposal T4 is not forced to make any assumptions about the ontological status of the inhabitants of minimal exhaustive models of normal form grammars, since they do not have to be defined as a particular kind of mathematical structure (nor is this option excluded if it is desired).<sup>20</sup> T4 shares with T3 the commitment to provid-

<sup>19</sup>See Richter (2007: Section 4) for extensive discussion and examples.

<sup>20</sup>The techniques enlisted in the construction of mathematical idealizations in T3 can easily be adapted to this end.

### Frank Richter

ing an isomorphic structure to each grammatical utterance of a given language rather than just a corresponding linguistic type. With King's theory T2, it shares the avoidance of mathematical entities representing linguistic facts.

HPSG is among a small group of grammar formalisms with a very precise outline of their formal foundations. This high degree of precision extends up to different but closely related ways of characterizing the meaning of grammars. The differences are in part of a very technical nature, but under the technical surface, they are due to different opinions of what grammars ought to describe. It is an advantage of HPSG as a grammar framework that all these approaches are built on the same explicit logical foundations. As a consequence, their relationships can be studied with the rigorous tools of mathematical logic. The philosophical debate regarding the adequacy of each interpretation of the nature and purpose of grammars is thus grounded in concrete mathematical structures. Finally, independent of philosophical arguments and preferences, proposal T1, enlisting typed feature structures as canonical structures in models, provides a bridge to the literature on feature logics, connecting linguistic theory to an interesting set of efficient computational methods, pursued in other chapters of the present handbook (Bender & Emerson 2021, Chapter 25 of this volume). This connection to computation and the rich literature on feature structures is unaffected by whether feature structure models are deemed adequate for linguistic theory.

# **Acknowledgments**

I would like to thank Adam Przepiórkowski and the reviewers Jean-Pierre Koenig, Stefan Müller and Manfred Sailer for their helpful suggestions which helped improve this chapter considerably.

# **References**


#### 3 Formal background


### Frank Richter

ogy and Syntax), 1497–1553. Berlin: Language Science Press. DOI: 10 . 5281 / zenodo.5599882.


# **Chapter 4**

# **The nature and role of the lexicon in HPSG**

Anthony R. Davis Southern Oregon University

Jean-Pierre Koenig

University at Buffalo

This chapter discusses the critical role the lexicon plays in HPSG and the approach to lexical knowledge that is specific to HPSG. We describe the tenets of lexicalism in general, and discuss the nature and content of lexical entries in HPSG. As a lexicalist theory, HPSG treats lexical entries as informationally rich, representing the combinatorial properties of words as well as their part of speech, phonology, and semantics. Thus many phenomena receive a lexically-based account, including some that go beyond what is typically regarded as lexical. We turn next to the global structure of the HPSG lexicon, the hierarchical lexicon and inheritance. We show how the extensive type hierarchy employed in HPSG accounts for lexical generalizations at various levels and discuss some of the advantages of default (nonmonotonic) inheritance over simple monotonic inheritance. We then describe lexical rules and their various proposed uses in HPSG, comparing them to alternative approaches to relate lexemes and words based on the same root or stem.

# **1 Introduction**

The nature, structure, and role of the lexicon in the grammar of natural languages has been a subject of debate for at least the last 50 years. For some, the lexicon is a prison that "contains only the lawless", to borrow a memorable phrase from Di Sciullo & Williams (1987: 3), and not much of interest resides there. In some recent views, the lexicon records merely phonological information and some world

Anthony R. Davis & Jean-Pierre Koenig. 2021. The nature and role of the lexicon in HPSG. in Stefan Müller, Anne Abeillé, Robert D. Borsley & Jean- Pierre Koenig (eds.), *Head-Driven Phrase Structure Grammar: The handbook*, 125–176. Berlin: Language Science Press. DOI: 10.5281/zenodo.5599824

#### Anthony R. Davis & Jean-Pierre Koenig

knowledge about each lexical entry (see Marantz 1997). All of the action is in the syntax, save the expression of complex syntactic objects as inflected words. In contrast, lexicalist theories of grammar, and HPSG in particular, posit a rich and complex lexicon embodying much of grammatical knowledge.

This chapter has two principal goals. One is to review the arguments for and against a lexicalist view of grammar within the generative tradition. The other is to survey the HPSG implementation of lexicalism. In regard to the first goal, we begin with the reaction to Generative Semantics, and note developments that led to lexicalist theories of grammar such as Lexical Functional Grammar (LFG) and then HPSG. Central to these developments was the argument that lexical processes, rather than transformational ones, provided more perspicuous accounts of derivational morphological processes. The same kinds of arguments then naturally extended to phenomena like passivization, which had previously been treated as syntactic. Once on this path, lexical treatments of other prototypically syntactic phenomena — long distance extraction, *wh*-movement, word order, and anaphoric binding — were advanced as well, with HPSG playing a leading role.

But this does not mean that opposition to lexicalism melted away. Both Minimalism, and in particular Distributed Morphology (Bruening 2018; Marantz 1997) and Construction Grammar (Goldberg 1995; Tomasello 2003; van Trijp 2011) claim that lexicalist accounts fail in various ways. We discuss some of these current issues, including the apparent occurrence of syntactically complex structures in the lexicon, word-internal ellipsis, and endoclitics, each of which poses challenges for those who advocate a strict separation between lexical and syntactic processes. While we maintain that the anti-lexicalist arguments are not especially strong, and the phenomena they are based on somewhat marginal, we acknowledge that these questions are not yet settled. We then turn to the specifics of the lexicon as modeled within HPSG. Lexicalism demands, of course, that lexical entries be informationally rich, encoding not merely idiosyncratic properties of a single lexical item like its phonology and semantics, but also more general characteristics like its combinatorial possibilities. We outline what HPSG lexical entries must contain, and how that information is represented. This leads naturally to the next topic: with so much information in a lexical entry, and so much of that repeated in similar ones, how is massive redundancy avoided? The hierarchical lexicon, in which individual lexical entries are the leaves of a multiple inheritance hierarchy, is a core component of HPSG. Types throughout the hierarchy capture information common to classes of lexical entries, thereby allowing researchers to express generalizations at various levels. Just as all verbs share certain properties, all transitive verbs, all verbs of caused motion, and all transitive

verbs of caused motion share additional properties, represented as constraints on types within the hierarchy. We draw on examples from linking, gerunds, and passive constructions as illustrations, but many others could be added.

Constraints specified on types in the hierarchy are deemed to be inherited by their subtypes, but monotonic inheritance of this kind runs into vexing issues. Most obviously, there are irregular morphological forms; any attempt to represent, say, the phonology of English plurals as a constraint on a plural noun class in the hierarchical lexicon must then explain why the plural of *child* is *children* and not \**childs*. Beyond this simple example, there are ubiquitous cases of lexical generalizations that are true by default, but not always. Various mechanisms for modeling default inheritance have therefore been one focus within HPSG, and we furnish an example of their use in modeling the properties of gerunds in English and other languages.

Finally, we discuss lexical rules and their alternatives. Along with the "vertical" relationships between classes of lexical entries modeled by types and their subtypes in the hierarchical lexicon, there is a need for "horizontal" relationships between lexical entries that are based on a single root or stem, such as forms of inflectional paradigms. Yet formalizing lexical rules adequately within HPSG has proven surprisingly difficult; specifying just what information is preserved and what is changed by a lexical rule is one prominent issue. We conclude this chapter by describing alternatives to lexical rules. One is to appropriately underspecify properties of lexical entries so that they cover all relevant variants of a single lexeme or word.<sup>1</sup> The second augments the type hierarchy via online type construction, extending the predefined lexical types specified in the hierarchy to include "virtual types" that combine the information from multiple predefined types.

# **2 Lexicalism**

# **2.1 Lexicalism and the origins of HPSG**

Lexicalism began as a reaction to Generative Semantics, which treated any regularity in the structure of words (derivational patterns, broadly speaking) as only

<sup>1</sup> It is common since the the late 1990s to distinguish between lexemes and words in HPSG, with, for example, some lexical rules mapping lexemes to lexemes (typically, derivational morphology) and some lexical rules mapping lexemes to words (typically, inflection); see Bonami & Crysmann (2018: 176–178) for general discussion and Runner & Aranovich (2003) for arguments that argument structure and valence features are specific to words, not lexemes. We do not further discuss the distinction between lexemes and words in this chapter for space reasons.

#### Anthony R. Davis & Jean-Pierre Koenig

epiphenomenally a matter of word structure and underlyingly as a matter of syntactic structure (see Lakoff 1970, among others). In the Generative Semantics view, all grammatical regularities are a matter of syntax (much of it, in fact, logical syntax). Chomsky (1970) presented many arguments that lexical knowledge differs qualitatively from syntactic knowledge and should be modeled differently. Jackendoff (1975) provides an explicit model of lexical knowledge that follows Chomsky's insights, although it focuses exclusively on derivational morphological processes. The main insight that Jackendoff formalizes is that relations between stems or words (say, between *destruct* and *destruction*) are to be modeled not via a generative device but through a redundancy mechanism that measures the relative complexity of a lexicon where these relations are present or not present (the idea is that a lexicon where *destruct* and *destruction* are related is simpler than one where they are not). Bochner (1993) is the most formalized and detailed version of this approach to lexical relations. Lexicalist approaches, including LFG and HPSG, took their lead from Jackendoff's work. LFG has relied heavily on treating relations between stems and between words as lexical rules, rather than the kind of generative devices that one finds in syntax. But, as accounts of linguistic phenomena in LFG focused increasingly on the lexicon, the question of whether lexical rules retain the character of redundancy rules or turn into yet another kind of generative device arose. Consequently, the necessity of lexical rules has been questioned as well (see Koenig & Jurafsky 1995 and Koenig 1999: 29–49 for potential issues that arise once lexical rules are assumed to be involved in the creation of new lexical entries).

Another stream of research relying on a richly structured lexicon is Generative Lexicon theory (GL). Pustejovsky (1991; 1995) and Pustejovsky & Jezek (2016) present the elements of this approach to lexical representation, which focuses on semantic phenomena such as coercion and systematic polysemy. Within GL, lexical entries include, in addition to argument structure, an "event structure" and a "qualia structure", both of which play essential roles in GL accounts of semantic composition. For example, the natural interpretation of *enjoy the sandwich* as enjoying eating the sandwich arises from information in the event structures of *enjoy* and *sandwich* and the qualia structure of *sandwich*, which unify to yield this interpretation.

Lexicalism, at least within HPSG, embodies two distinct ideas. First is the idea that parts of words are invisible to syntactic operations (*Lexical Integrity*, see Bresnan & Mchombo 1995), so that relations between stems and between word forms cannot be the result of or follow syntactic operations, as in Distributed Morphology (Halle & Marantz 1993), or other linguistic models that assign no

special status to the notion of word. Relations between words are therefore not modeled via syntactic operations (hence the appeal to Jackendoff's lexical rules early on and to unary branching rules more recently). Second is the idea that the occurrence of a lexical head in distinct syntactic contexts arises from distinct variants of words. For instance, the fact that the verb *expect* can occur both with a finite clause and an NP+VP sequence (see (1a) vs. (1b)) means that there are two variants of the verb *expect*, one that subcategorizes for a finite clause and one where it subcategorizes for an NP+VP sequence.<sup>2</sup>

	- b. I expected him to leave yesterday.

Not all lexicalist theories, though, cash out these two distinct ideas the same way. The net effect of lexicalism within HPSG is that words and phrases are put together via distinct sets of constructions and that words are syntactic atoms. These two assumptions justify positing two kinds of signs, *phrasal-sign* and *lexical-sign*, and go hand in hand with the surface-oriented character of HPSG and what one might call a principle of surface combinatorics: if expression A consists of the concatenation of B and C (B ⊕ C), then all grammatical constraints that make reference to B and C are limited to A.

An evident concern regarding this view of the lexicon is the potential proliferation of lexical entries, replete with redundant information. Will it be necessary to specify all the information in these two variants for *expect* without regard for the large amount of duplication between them? Will the same duplication be needed for the verb *hope*, which patterns similarly (but not quite identically)? How will somewhat similar verbs, such as *foresee* and *anticipate* which allow finite complements but not infinitive ones, be represented? We will describe HPSG's solutions to these questions below, in our discussion of the hierarchical lexicon. First, however, we turn to recent arguments against lexicalism, and then discuss in more detail just what kinds of information should be in HPSG lexical entries.

# **2.2 Recent challenges to lexicalism**

As there have been several challenges to lexicalism (see Bruening 2018 and Haspelmath 2011 among others for some recent challenges), we now explore lexicalism and Lexical Integrity in HPSG in more detail. We first note that lexicalism

<sup>2</sup>As this chapter is an overview of the approach to lexical knowledge HPSG embodies rather than a description of particular HPSG analyses of phenomena, we will sample liberally from various illustrative examples and simplify the analyses whenever possible so that readers can see the forest and not get lost in the trees.

#### Anthony R. Davis & Jean-Pierre Koenig

does not imply that word and phrase formation are necessarily different "components" as is often claimed (see Marantz 1997, Bruening 2018). Some lexicalist approaches *do* assume that word formation and phrase building belong to two different components of a language's grammar (this is certainly true of Jackendoff 1975), but they need not. Within HPSG, there are approaches that treat every sign-formation (be it word-internal or word-external) as resulting from typed mother-daughter configurations; this is the hypothesis pursued in Koenig 1999, and is also the approach frequently taken in implementations of large-scale grammars where lexical rules are modeled as unary-branching trees; see the English Resource Grammar (Copestake 2002) and the grammars developed in the CoreGram project (Müller 2015) (see Müller 2018b: 58 for a similar point in his response to Bruening's paper).

Furthermore, recent approaches to inflectional morphology within HPSG model realizational rules through the very same tools the rest of a language's grammar uses (see Crysmann & Bonami 2016 and Crysmann 2021, Chapter 21 of this volume). There are also analyses of the structure of phrases where the same analytical tools – multidimensional hierarchy of types and inheritance – developed to model lexical knowledge (see Section 4) are employed to model phrase-structural constructions (see Sag's 1997 analysis of relative clauses, for example). So, both in terms of the formal devices and in terms of analytical tools used to model datasets, words and phrases can be treated the same way in HPSG (although they need not be). Somewhat ironically, and despite claims to the contrary, word formation in the syntactocentric approach Marantz or Bruening advocates *does* make use of distinct formal machinery to model word formation, namely realizational rules and various readjustment rules, as well as fusion and fission rules, to model inflectional morphology (see Halle & Marantz 1993; Embick 2015).

With this red herring out of the way, we concentrate on the two most important challenges Bruening (2018) and Haspelmath (2011) present to lexicalist views. The first challenge are cases of phrasal syntax feeding the lexicon, purportedly exemplified by sentences such as (2).

(2) I gave her a don't-you-dare! look.<sup>3</sup>

We can provisionally accept for the sake of argument Bruening's contention that *don't-you-dare!* is a word in (2), despite its reliance on the (unjustified) assumption that the secondary object in (2) involves N-N compounding rather than an AP N structure (we refer readers to Bresnan & Mchombo 1995 or Müller 2018b for counter-arguments to Bruening's claim). Crucially, though, examples such as

<sup>3</sup>Bruening (2018: 3)

(2) have no bearing on HPSG's model of lexical knowledge, as HPSG-style lexicalism does not preclude constructions that form words from phrases. Nothing, as far as we know, rules out constructions of the form *stem/word* → *phrase* in HPSG. The two assumptions underlying the HPSG brand of lexicalism we mentioned above do not preclude a *lexical-sign* having a *phrasal-sign* as sole daughter (although we do not know of any HPSG work that exploits this possibility) and examples such as (2) are simply irrelevant to whether HPSG's lexicalist stance is empirically correct.

The second challenge to lexicalism presented in Bruening (2018) bears more directly on HPSG's assumption that words are syntactic atoms. Word-internal conjunction/ellipsis examples, illustrated in (3) (adapted from Bruening's (31a), p. 14), seem to violate the assumption that syntactic constraints cannot "see" the internal structure of words, as ellipsis in these kinds of examples seems to have access to the internal part of the word *over-application*. In fact, though, such examples do not violate Lexical Integrity if one enriches the representation of composite words (to borrow a term from Anderson 1992: Chapter 11) to include a representation of their internal phonological parts as proposed in Chaves (2008) and Chaves (2014).

(3) Over- and under-application of stress rules plagues Jim's analysis.

Chaves' analysis assumes that the phonology of compound words and words that contain affixoids (to borrow a term from Booij 2005: 114–117) is structured. The MorphoPhonology or MP attribute of words (and phrases) is a list of phonological forms and morphs information. The MP of compound words and words that contain affixoids includes a separate member for each member of the compound, or for the affixoid and stem. Thus in (3), the MPs of *overapplication* and *underapplication* each contain two elements: one for *over*/*under*, and one for *application*. Given this enriched representation of the morphophonology of words like *under/overapplication*, a single ellipsis rule can apply both to phrases and to composite words, eliding the second member of the word *overapplication*'s MP. As Chaves (p. 304) makes clear, such an analysis is fully compatible with Lexical Integrity, as there is no access to the internal structure of composite words, only to the (enriched) morphophonology of the entire word.

Haspelmath (2011) similarly challenges the view that syntactic processes may not access the internal structure of words, although Haspelmath's point is merely that what is a word is cross-linguistically unclear. So-called suspended affixation in Turkish (see (4), Haspelmath 2011: 48) also shows that word parts can be elided. We cannot discuss here whether Chaves' analysis can be extended to cases like

Anthony R. Davis & Jean-Pierre Koenig

(4) where suffixes are seemingly elided or whether lexical sharing (where a single word can be the daughter of two c-structure nodes à la McCawley 1982), as proposed in Broadwell (2008), is needed.

(4) kedi cat ve and köpek-ler-im-e dog-PL-1SG-DAT 'to my cat(s) and dogs'

(Turkish)

(Pashto)

What is important for current purposes is that these putative challenges to Lexical Integrity such as (3) or (4) do not necessarily render a substantive version of it implausible. The same is true of another potential challenge to Lexical Integrity which neither Bruening nor Haspelmath discuss, endoclitics, which we turn to next.

Endoclitics are clitics that at least appear to be situated within a word, rather than immediately preceding or following it, as clitics often do. In many cases, endoclitics appear at morphological boundaries, as in the well-studied pronominal clitics of European Portuguese (Crysmann 2001). An approach similar to what we have referenced above for composite words and elided morphology may extend to these as well. But some trickier cases have also come to light, in which the clitic appears within a morpheme, not at a boundary. Two of the best documented cases come from the Northeast Caucasian language Udi (Harris 2000) (see (5)), and from Pashto (Tegey 1977; Roberts 2000; Dost 2007) (see (6))).

	- push1=CL.1SG-push2.PF.PST.3SG 'I pushed it.'

In these cases, as with clitics in general, there is a clash between the phonological criteria for wordhood, under which the clitics would be regarded as in-

<sup>4</sup>Harris (2000: 599)

<sup>5</sup>Tegey (1977: 92)

<sup>6</sup>Tegey (1977: 92)

corporated within words, and the syntactic constituency and semantic compositionality. But what makes these particularly odd is that these clitics are situated word-internally, even morpheme-internally. Udi subject agreement clitics such as *q'un* in (5) typically attach to a focused constituent, which can be a noun, a questioned constituent, or a negation particle as well as a verb (Harris 2000). Under certain conditions, as in (5), none of these options is available or permitted, and the clitic is inserted before the final consonant of the verb root, dividing it in two pieces, neither of which has any independent morphological status. Its position in this instance is apparently phonologically determined; it cannot appear word-finally or word-initially, and as there is no morphological boundary within the word it must therefore appear within the monomorphemic root. Pashto clitics seek "second position", whether at the phrasal, morphological, or phonological level; *me* in (6) appears to be situated after the first stressed syllable (or metrical foot), which, in the case of (6b), also divides the verb into two parts that lack any independent morphological status.

If clitics are viewed as a syntactic phenomenon ("phrasal affixes", as Anderson 2005 puts it), these endoclitics must "see" into the internal structure of words (be it morphological, prosodic, or something else), thereby seemingly violating Lexical Integrity. Anderson's brief account invokes a reranking of Optimality Theoretic constraints from their typical ordering, whereby the clitic's positional requirements outrank Lexical Integrity requirements. Crysmann (2000) proposes an analysis, paralleling in many respects his account of European Portuguese clitics in Crysmann (2001), using Reape's constituent order domains (Reape 1994) and, in particular, Kathol's topological fields (Kathol 2000; see also Müller 2021b: Section 6.1, Chapter 10 of this volume). The "morphosyntactic paradox" in Udi, to borrow a phrase from Crysmann (2003: 373), is effectively "resolved on the basis of discontinuous lexical items"; this account then "parallels HPSG's representation of syntactic discontinuity" (Crysmann 2000).

For Pashto, researchers generally agree that the notion of second position is crucial, but that it can be defined at various levels — phrasal, lexical, and phonological. In this last case, clitics can appear within a word following the first metrical unit, as illustrated above. Dost (2007) invokes the mechanisms of word order domains (Reape 1994) and topological fields (Kathol 2000) at these various levels to account for this distribution of clitics. In this analysis, some words contain more than one order domain at the prosodic level. Lexical Integrity is preserved to the extent that, while domains at the prosodic level are "visible" to clitics in Pashto, syntactic processes do not reference the internal makeup of words.

Still, these accounts of endoclitics in Udi and Pashto appear to breach the wall

#### Anthony R. Davis & Jean-Pierre Koenig

of the strictest kind of Lexical Integrity, as they require access to some of the internal structure of lexical entries through a partial decomposition of their morphophonology into distinct order domains. Yet we would not wish to advocate models that permit unconstrained violations of Lexical Integrity, either. The troublesome cases we have noted here are relatively marginal or cross-linguistically rare; they seem to be limited in scope to prosodic or morphophonological information (e.g., ellipsis, insertion). As Broadwell (2008) points out when comparing possible analyses of Turkish suspended affixation, rejecting lexicalism altogether may lead to an unconstrained theory of the interaction between words/stems and phrases and thereby to incorrect predictions (e.g., that all affixes in Turkish can be suspended). Likewise, we would not expect to find a language in which endoclitic positioning is utterly unconstrained or where syntactic operations are sensitive to the fact that *anticonstitutional* is based on the nominal root *constitution*, or where coordination of affixes is always possible. Rejecting lexicalism begs the question of why such languages do not seem to exist, why what is visible to syntactic operations of the internal structure of words (morphophonological structure) is so restricted or why even that kind of morphophonological visibility is so rare (particular affixoids and endoclitics, say).

# **3 Lexical entries in HPSG**

# **3.1 What are lexical entries?**

Because lexical entries (derived or not<sup>7</sup> ) play a central role in accounting for the syntax of natural languages, lexical entries are informationally rich in HPSG. An additional consequence of HPSG's lexicalist stance is that there will be many lexical entries where one might at first glance expect a single entry. We will see below how HPSG handles multiple entries and classes of entries while avoiding redundancy, but it is important at the outset to clarify what a lexical entry is in HPSG. One misunderstanding about lexical entries conflates descriptions and the entities they describe, or, in other words, fails to distinguish between constructions in the abstract and a particular word or phrase (i.e., a lexical entry vs. a fully instantiated lexeme or word). As Richter (2021), Chapter 3 of this volume makes explicit, grammars in HPSG consist of *descriptions* of structures, and the lexicon thus consists of descriptions of what a fully specified lexeme or word

<sup>7</sup>Researchers working in the tradition of Höhle use the term *lexical entry* for lexical items that are stored in the lexicon and *lexical item* for all lexicon elements, that is, stored lexical items and those licensed by lexical rules. We will not make this distinction here.

can be. To see the importance of the distinction between descriptions (stored or derived entries) and the fully instantiated objects that are being described, consider HPSG's model of subcategorization with reference to the relevant portion of the tree for sentence (1a). HPSG's model of the dependency between heads and complements stipulates identity between the syntactic and semantic information of each complement (the value of the SYNSEM attribute) and a member of the list of complements the head subcategorizes for. Since there are indefinitely many SYNSEM values, on the assumption that there are indefinitely many clausal meanings (a point Jackendoff 1990: 8–9 emphasizes), there are, in principle, indefinitely many fully instantiated entries for the verb *expect* subcategorizing for a clausal complement (as in (1a)). But each of these fully instantiated entries for *expect* – one for each clausal sentence that corresponds to the tree in Figure 1 – corresponds to a single abstract description, and it is this description that the lexicon contains.

Figure 1: Sharing of valence information in a head-complement phrase

The formal status of lexical entries has engendered a fair amount of theoretical work and some debate. We will touch on some aspects of this further below, in connection with online type construction. For further discussion of these kinds of issues, see Richter (2021), Chapter 3 of this volume and Abeillé & Borsley (2021), Chapter 1 of this volume.

# **3.2 What information is in lexical entries?**

Aside from the expected phonological and semantic information, specific to each lexeme or word, lexical entries include morphological information and information about their combinatorial potential. Morphosyntactic features can be part of the input to inflectional rules, but are also used to select the appropriate types of phrases (via their projections through the Head Feature Principle, see Abeillé & Borsley 2021: 22, Chapter 1 of this volume), as shown in (7). Some verbs, for instance, select for a PP headed by a particular preposition; others select for VPs whose verb is a gerund, or a bare infinitive, and so forth. Lexical entries thus include as much morphological information as both (inflectional) morphology and syntactic selection require.

#### Anthony R. Davis & Jean-Pierre Koenig

(7) a. John conceived *of/\*about* the world's tastiest potato chip. b. John regretted *going/\*(to) go* to the party.

We illustrate the second leading idea behind HPSG or LFG's lexicalism – that there are different variants of lexical heads for different contexts in which heads occur – with the French examples in (8). The verb *aller* 'go' in (8a) combines with a PP headed by *à* that expresses its goal argument and a subject that expresses its theme argument. The same verb in (8b) combines with the so-called nonsubject clitic *y* that expresses its goal argument. We follow Miller & Sag (1997) and assume here that French non-subject clitics are prefixes. Since the context of occurrence of the head of the sentence, *aller*, differs across these two sentences (NP\_\_\_\_PP[*loc*] and NP *y*\_\_\_\_ , respectively and informally), there will be two distinct entries for *aller* for both sentences, shown in (9) and (10) (we simplify the entries' feature geometry for expository purposes). Information in the entry in (10) that differs from the information in the entry in (9) appears in red; *p-aff* indicates that a member of ARG-ST is realized as a pronominal affix.

	- b. Muriel Muriel y there va. go.PRS.3SG 'Muriel is going there.'

(French)

CATegory information in both entries contains part of speech information (including morphologically relevant features of verb forms), ARGument-STructure information and valence information under SUBJ and COMPS. MORPH information includes both stem form information, inflected form information (I-FORM) and, in case so-called clitics are present, the combination of the clitic and inflected form information. Both entries illustrate how informationally rich lexical entries are in HPSG. But, postulating informationally rich entries does not mean stipulating all of the information within every entry. In fact, only the stem form and the relation denoted by the semantic content of the verb *aller* need to be stipulated within either entry. All the other information can be inferred once it is known which classes of verbs these entries belong to. In other words, most of the information included in the entries in (9) and (10) is not specific to these individual entries, an issue we take up in Section 4. As mentioned above, the informational difference between the two entries for *va* and *y va* is indicated in red in (10). The first difference between the two variants of *va* 'goes' is in the list of complements: the entry for *y va* does not subcategorize for a locative PP, since the affix *y* satisfies the relevant argument structure requirement. This difference in the realization of syntactic arguments (via phrases and pronominal affixes) is recorded in the types of the PP members of ARG-ST, *p-aff* in (10), but a PP headed by *à* in (9). Finally, the two entries differ in the FORM of the verb, which is the same as the inflected form of the verb in (9) (as indicated by the identically numbered 5 ), but not in (10), whose FORM includes the prefix *y*.

One other question arises with regard to the information in lexical entries. Are there attributes or values that occur solely within lexical signs, and not in

phrasal ones? If so, they would provide a diagnostic for distinguishing lexical signs from others. The ARG-ST list, which we included in the categorial information of signs in (9) and (10), might be regarded as a feature confined to lexical signs (see, among others, Ginzburg & Sag 2000: 361), on the premise that lexical items alone specify combinatorial requirements (but see Przepiórkowski 2001 for a contrary view, and see Müller 2021b: Section 7, Chapter 10 of this volume for other views questioning this assumption). But HPSG researchers have generally not explored this question in depth, and we will leave this issue here.

# **3.3 The role of the lexicon in HPSG**

As we hope is evident by now, the lexicon plays a critical role in HPSG's explanatory mechanisms, as lexical entries encode not merely their idiosyncratic phonological and semantic characteristics, but their distributional and combinatorial potential as well. Much of the information contained in lexical entries is geared towards modeling how words interact with one another, as we have already seen. As detailed in Davis, Koenig & Wechsler (2021), Chapter 9 of this volume, their combinatorial potential is recorded using two kinds of information, a list of syntactic arguments or syntactic requirements to be satisfied, and distinct lists that indicate how these requirements are to be satisfied (as local dependents, as non-local dependents, or as clitics/affixes). Not only are syntactic arguments recorded; so is their relative obliqueness (in terms of grammatical function), as per the partial hierarchy in (11) from Pollard & Sag (1992: 266).

```
(11) SUBJECT < PRIMARY OBJ < SECOND OBJ < OTHER COMPLEMENTS
```
We illustrate this explanatory role by noting the role of the lexicon in HPSG's approach to binding, as described in Pollard & Sag (1992) (see Müller 2021a, Chapter 20 of this volume for details). As lexical entries of heads record both syntactic and semantic properties of their dependents, constraints between properties of heads and properties of dependents, e.g., subject-verb agreement, or between dependents, e.g., binding constraints, illustrated in (12), can be stated, at least partially, as constraints on classes of lexical entries. The principle in (13) is such a constraint.

	- b. \* Mathilda saw her in the mirror.

Principle (13) is, formally, a constraint on lexical entries that makes use of the required information in an entry's argument structure regarding the syntactic and semantic properties of its dependents. The three argument structures in (14) illustrate permissible and ungrammatical entries. (14a) illustrates exempt anaphors, as there is no less oblique syntactic argument than the anaphoric NP (Müller 2021a: Section 2.3); (14b) illustrates a non-exempt anaphor properly bound by a less oblique, co-indexed non-anaphor; (14c) illustrates an ungrammatical lexical entry that selects for an anaphoric syntactic argument that is not co-indexed by a less oblique syntactic argument, despite not being an exempt anaphor (i.e., not being the least oblique syntactic argument).

(14) a. - ARG-ST NP*,*, … b. - ARG-ST NP NP*,* c. \* - ARG-ST XP , NP*,*

Our purpose here is not to argue in favor of the specific approach to binding just outlined. Rather, we wish to illustrate that in a theory like HPSG where much of syntactic distribution is accounted for by properties of lexical entries, co-occurrence restrictions treated traditionally as constraints on trees (via some notion of command) are modeled as constraints on the argument structure of lexical entries. It is tempting to think of such a lexicalization of binding principles as a notational variant of tree-centric approaches. Interestingly, this is not the case, as argued in Wechsler (1999). Wechsler argues that the difference between argument structure and valence is critical to a proper model of binding in Balinese. Summarizing briefly, voice alternations in Balinese (e.g., objective or agentive voices) do not alter a verb's argument structure but do alter its valence – the subject and object it subcategorizes for. As binding is sensitive to relative obliqueness within ARG-ST, binding possibilities are not affected by voice alternations within the same clause, which are represented with different valence values. In the case of raising, on the other hand, the argument structure of the raising verb and the valence of the complement verb interact, as the subject of the complement verb is part of the argument structure of the raising verb. An HPSG approach to binding therefore predicts that voice alternations within the embedded clause will not affect binding of co-arguments of the embedded verb, but will affect binding of the raised NP and an argument of the embedded verb. This prediction seems to be borne out, as the Balinese examples in (15) show.

(15) a. Ia 3 nawang AV.know awakne self / ia∗ 3 lakar FUT tangkep OV.arrest polisi. police (Balinese) 'He knew that the police would arrest himself/him∗ .'

#### Anthony R. Davis & Jean-Pierre Koenig


Sentence (15a) shows that the proto-agent (the first element of ARG-ST) of the subject-to-object raising verb *nawang* 'know' can bind the raised subject (which in this case corresponds to the proto-patient of the complement verb *tangkep* 'arrest' since that verb is in the objective voice). Sentence (15b) shows that the raised (proto-agent) subject of the complement verb can bind its proto-patient argument. Critically, sentence (15c) shows that the raised proto-patient (second) argument of the complement verb can be bound by the complement verb's protoagent. The contrast between sentences (15b) and (15c) illustrates that while binding is insensitive to valence alternations (the same proto-agent binds the same proto-patient argument in both sentences), raising is not (the proto-agent argument is raised in (15b) and the proto-patient argument in (15c)). As Wechsler argues, this dissociation between valence subjects and less oblique arguments on the ARG-ST list is hard to model in a configurational approach to binding that equates the two notions in terms of c-command or the like. What is important for our purposes is that a "lexicalization" of argument structure, valence, and binding has explanatory power beyond tree configurations, illustrating some of the analytical possibilities informationally rich lexical entries create.

.

See also Müller (2021a: Section 5), Chapter 20 of this volume for a more detailed discussion of parallel data from Toba Batak.

# **3.4 Lexical vs. constructional explanations**

As we have noted above, HPSG posits that much of the combinatorics of natural language syntax is lexically determined; lexical entries contain information about their combinatorial potential and, as a consequence, if a word occurs in two distinct syntactic contexts, it must have two distinct combinatorial potentials. Under this view, phrase-structure rules are boring and few in number. They are just the various ways for words to realize their combinatorial potential. In the version of HPSG presented in Pollard & Sag (1994), for example, there are only a handful of general phrase-structural schemata, one for a head and its complements, one for a head and its specifier, one for a head and a filler in an unbounded dependency and so forth, and the structure of clauses is relatively

flat in that relations between contexts of occurrence of words is done "at the lexical level" rather through operations on trees that increase the depth of syntactic trees.

In a transformational approach, on the other hand, relations between contexts of occurrence of words are seen as relations between *syntactic* trees, and the information included in words can thus be rather meager. In fact, in some recent approaches, lexical entries contain nothing more than some semantic and phonological information, so that even part of speech information is something provided by the syntactic context (see Borer 2003; Marantz 1997). In some constructional approaches (Goldberg 1995, for example), part of the distinct contexts of occurrence of words comes from phrase structural templates that words fit into. So again, there can be a single entry for several contexts of occurrence.

HPSG's approach to lexical knowledge is quite similar to that of Categorial Grammar (to some degree this is due to HPSG's borrowing from Categorial Grammar important aspects of its treatment of subcategorization).<sup>8</sup> As in HPSG, the combinatorial potential of words is recorded in lexical entries so that two distinct contexts of occurrence correspond to two distinct entries. The difference from HPSG lies in how lexical entries relate to each other. In many forms of Categorial Grammar (be it Combinatorial or Lambek-calculus style), relations between entries are the result of a few general rules (e.g., type raising, function composition, hypothetical reasoning, etc.) (see Dowty 1978 for an approach that countenances lexical rules, though). The assumption is that those rules are universally available; however, those rules may be organized in a type hierarchy and an individual language might avail itself of only a portion of this hierarchy, as in Baldridge (2002). Relations between entries in HPSG can be much more idiosyncratic and language-specific. We note, however, that nothing prevents lexical rules constituting a part of a Categorial Grammar (see Carpenter 1992a), so that this difference is not necessarily qualitative, but concerns how much of researchers' efforts are typically spent on extracting lexical regularities; HPSG has focused much more, it seems, on such efforts.

# **4 The hierarchical lexicon**

We have now seen that lexicalism demands that lexical entries be information rich, in order to encode what might otherwise be represented as syntactic rules. To avoid massive and redundant stipulation throughout the lexicon, we need

<sup>8</sup>See also the chapters by Flickinger, Pollard & Wasow (2021: Section 5) on the history of HPSG and by Kubota (2021) for a comparison of HPSG and Categorial Grammar.

#### Anthony R. Davis & Jean-Pierre Koenig

mechanisms to represent the regularities within it. Two main mechanisms have been used in HPSG to achieve this. The first mechanism is the organization of information shared by lexical entries or parts of entries into a hierarchy of types, in a way quite similar to semantic networks within knowledge representation systems (see, among others, Brachman & Schmolze 1985). This hierarchy of types (present in HPSG since the beginning: Pollard & Sag 1987 and the seminal work of Flickinger et al. 1985, Flickinger 1987) ensures that individual lexical entries only specify information that is unique to them. The second mechanism is lexical rules, which relate variants of entries, and more generally, members of a lexeme's morphological family (which consists of a root or stem as well as all stems derived from that root or stem) or members of a word's inflectional paradigm.

HPSG is, of course, not the only linguistic framework to exploit inheritance, although HPSG researchers, perhaps more than others, have emphasized its central role in expressing lexical generalizations. Appeals to similar mechanisms feature prominently in Generative Lexicon (GL) accounts of lexical semantics, for example. Both the lexical typing structure and qualia structures within GL, in particular the formal quale, have values situated in type hierarchies (Pustejovsky & Jezek 2016) and GL accounts of coercion and metonymy rely crucially on multiple inheritance within qualia values.

In this section, we discuss the hierarchical organization of the lexicon into cross-cutting classes of lexical entries at various levels of generality. We examine two distinct techniques for inheritance, which are not mutually exclusive. One is to create subtypes directly, with pertinent additional constraints stated for each subtype. Different classes of words are thus reified as subtypes of *word* (or *lexicalsign*) in the hierarchy, and all lexical items that belong to that subtype inherit its constraints. Another technique, more prevalent in current HPSG work, uses implicational statements. If certain properties hold of a lexical item (for example, if its AUX value is +), then others must hold as well (e.g., it subcategorizes for a VP complement, whose subject is token identical to the auxiliary verb's). These statements need not involve all of the information that's present in the entire *word*, so they need to refer only to substructures within *word* objects, like their SYNSEM values.

# **4.1 Inheritance**

All grammatical frameworks classify lexical entries to some extent, of course. Basic part of speech information is one obvious case. This high-level classification is present in HPSG, too, as part of the hierarchy of types of heads. That information is recorded in the value of the HEAD feature. A simple hierarchy of types of heads is depicted in Figure 2.

Figure 2: A hierarchy of subtypes of *head*

Each of these types is (typically) a partial specification of a lexical entry's head properties. Typing of HEAD information allows the ascription of appropriate properties to different classes of lexical entries. For example, case information is only relevant to nouns in English, and whether a verb is an auxiliary or not is only relevant to verbs. The subtypes of *head* in Figure 2 allow us to define additional specifications of the properties appropriate for different parts of speech. For example, English lexical entries with a HEAD value of *noun* contain an attribute for CASE, while those for *verb* contain the attributes AUX, TENSE, and ASPECT, as shown in (16) (we use implicational statements in (16) to indicate feature appropriateness conditions for the types *noun* and *verb* for perspicuity only; such conditions would be part of the grammar's signature, see Richter (2021: 98), Chapter 3 of this volume).<sup>9</sup> In other words, the grammar's signature will specify that only for nouns (those lexical entries whose HEAD value is of type *noun*) is the attribute CASE appropriate. Similarly, only for verbs are the attributes AUX, TENSE, and ASPECT appropriate.

	- b. If the attributes AUX, TENSE, or ASPECT are attributes within a lexical entry's HEAD value, then the value of HEAD is of type *verb*.

Typing of parts of speech thus lets us specify what it means for a part of speech to be a noun or a verb in a particular language (of course, there will be strong similarities in these properties across languages) and omit for individual noun and verb entries properties they share with all nouns and verbs.

The statements in (16) are in some sense merely definitional, as noted. But they allow us to state just once the general information that applies to whole classes of lexical entries. Thus, the pronoun *him* need only include the fact that it bears accusative case; the fact that it is a noun can be inferred. Similarly, the entry for

<sup>9</sup>Strictly speaking the logic works in both ways: the presence of features like CASE makes it possible to infer the presence of the type *noun* and the type *noun* requires the feature CASE to be there. We focus on the first implication here.

the verb *can* need only include the head information - AUX + for us to be able to infer that it is a *verb* (assuming AUX is not an appropriate attribute for another type).

# **4.2 Representing lexical generalizations**

So far, we have merely shown an HPSG implementation of a part of speech taxonomy, but once we consider subtypes with additional constraints, the utility of the hierarchical lexicon within a lexicalist framework becomes apparent. There are interesting generalizations to be made about more specific classes, such as transitive verbs, past participles, or predicators denoting caused motion (regardless of their part of speech). In the hierarchical lexicon, we can represent these "interesting" classes as types. Which classes are worth positing in the grammar of a given language depends on the properties of its grammar; thus we expect lexical classes to specify a mix of cross-linguistically common (possibly, in some cases, universal) and language-particular constraints.

A seemingly straightforward way to "capture generalizations about the elements of the lexicon" is to posit a hierarchy of subtypes of *word*. Thus types such as *verb-word* and *noun-word* specify properties of verbs and nouns, and types such as *subj-control-pred* and *obj-control-pred* specify properties of predicates that exhibit subject and direct object control. Individual lexical items belong to multiple types in the hierarchy; the verbs *try* and *attempt* inherit the information from *verb-word* and *subj-control-pred*, while the nouns *attempt* and *effort* inherit the information from *noun-word* and *subj-control-pred*.

Ackerman & Webelhuth (1998) use this kind of hierarchy of subtypes of*word* in their accounts of German passives and other phenomena, which we will discuss briefly in the following section. In this case, the information involved in their account is both morphological and syntactic, and they propose a hierarchy of verb types at the *word* level.

However, a hierarchy of subtypes of *word* is, while formally feasible, potentially rather inelegant. Note first that types like *verb* and *noun* are already defined as subtypes of the type *head*. There is an obvious danger of redundancy if we additionally posit parallel subtypes of *word* such as *verb-word* and *noun-word*, serving no other function than as types with the corresponding HEAD values. Furthermore, signs in HPSG are structured objects, with their various kinds of information deliberately arranged in a way that associates pieces of information that "travel together." The information within HEAD, for example, is grouped there because it is all subject to the Head Feature Principle. Both part-of-speech and control information are found within SYNSEM, as phonological information

has no bearing on these things. So rather than creating subtypes of *word* to capture regularities in the lexicon, we would prefer to express those regularities as constraints on subtypes that encompass only the information that's pertinent. These are the smallest, "narrowest" portions of *word* objects that include all that information; the remaining portions of a *word* can be ignored in this context. In other words, we take advantage of HPSG's feature geometry and of the hierarchies of types appropriate for a particular substructure within signs to express generalizations as "locally" as possible (see Richter 2021, Chapter 3 of this volume).

Implicational statements can serve well for expressing generalizations as "locally" as possible; they constrain the range of possible values of attributes and can stipulate structure sharing among them. As a simple example, consider the possible complements of prepositions. Unlike verbs, which, at least in some languages, can have multiple elements on their COMPS lists, prepositions are limited to at most one. There are no ditransitive prepositions (as far as we are aware). The following statement expresses this generalization in English as well as more formally.

	- b. - HEAD *prep* ⇒ - COMPS hi∨ *synsem*

A more extensive example concerns linking of semantic roles to syntactic arguments, and is drawn from the work of Davis & Koenig (2000), Davis (2001), and Koenig & Davis (2003). Consider the examples in (18).

	- b. Rover dragged the toy to the den.
	- c. Rover jumped over the fence.

The mapping from semantic roles to subjects and objects in these sentences can be described by the following informally stated constraints:

	- b. Caused motion verbs link the causer to the subject and the moving entity, if distinct from the causer, to the direct object.
	- c. However, caused motion verbs in which the causer and moving entity are the same thing can link both to the subject (and needn't have a direct object).

The second and third statements are subcases of the first, so ideally we prefer to state the substance of the first statement just once, rather than repeat it. We could posit subtypes of *word*, along the lines of the approach mentioned above, such as *transitive-verb* and *caused-motion-verb*. But implicational statements provide an arguably simpler way to model the facts of linking. Since the constraints we wish to express concern both ARG-ST and CONT, our implications are stated on *local* objects, which are the minimal type of object containing these attributes. We presuppose here a hierarchy of semantic relation types as values of CONT, including *cause-rel*, *motion-rel* and their subtype *caused-motion-rel*, each of which licenses attributes for the required participant roles.

First, we require that the causer, denoted in (20b) by the value of ACT, be linked to the subject (the first element of ARG-ST):

(20) a. If a *synsem* object's CAT|HEAD value is of type *verb*, and its CONT value is of type *cause-rel*, then its value of CONT|ACT is token identical to the index of the first element of its ARG-ST list.

$$\begin{array}{c} \text{b.} \quad \begin{bmatrix} \text{CAT} \mid \text{HEAD } \text{verb} \\ \text{CNT } \; \text{cause-rel} \end{bmatrix} \Rightarrow \begin{bmatrix} \text{ARG-ST} \left\langle \text{NP} \middle| \overleftarrow{\Box} \right\rangle \dots \end{bmatrix} \end{array}$$

Then, we link the moving entity in a caused motion verb, denoted in (21b) by the value of UND, to any NP on ARG-ST:

(21) a. If a *synsem* object's CAT|HEAD value is of type *verb*, and its CONT value is of type *move-rel*, then its value of CONT|UND is token identical to the index of some NP element of its ARG-ST list.

$$\text{b. } \begin{bmatrix} \text{CAT} \vert \text{HEAD } verb\\ \text{convert } move\text{-}rel \end{bmatrix} \Rightarrow \begin{bmatrix} \text{ARG-ST} \left\langle \dots, \text{NP} \vert \overleftarrow{\Box} \right\rangle \dots \\ \text{CONT} \quad \left[ \text{UND} \left\vert \boxdot{\Box} \right\rangle \end{bmatrix}$$

Both of these implicational statements apply to a verb with a CONT value of type *caused-motion-rel*. Note that if the causer and the moving entity are distinct, they will be realized as separate NPs on the ARG-ST list. This is the linking pattern we find in numerous verbs, such as *throw*, *lift*, *expel*, and so on. In some cases, however, the causer and the moving entity may be one and the same. If the ACT and UND values are identical in CONT, then the second implication allows the moving entity to be realized as the subject, or as a reflexive direct object, as in:

(22) The kids deliberately rolled (themselves) down the hill.

What is ruled out by this pair of statements, though, is a hypothetical verb *quoll*, with a linking pattern like this:

(23) \* The rock quolled the kids down the hill. Intended: 'The kids rolled the rock down the hill.'

Additional restrictions may apply to some verbs of motion. For instance, many verbs of locomotion entail that the causer and moving entity are identical, and allow only an intransitive variant:

(24) The kids strolled (\*themselves) down the hill.

We could represent this identity using another constraint, solely within CONT, as follows, where the type *self-move-rel* is a subtype of *move-rel*:

	- b. CAT|HEAD *verb* CONT *self-move-rel* ⇒ CONT ACT 1 UND 1

When we consider the most specific types of the lexical hierarchy, where individual lexical entries reside, the same kinds of constraints, pertaining solely to a given lexical entry's phonological form, inflectional class, specific semantics, register, and so forth, can be employed. This lexeme or word-specific information needs to be spelled out somewhere in any grammatical framework. We can now view this as just the narrowest, most particular case of specifying information about a class of linguistic entities. At the same time, information shared across a broader set of lexical entries need not be stated separately for each one. Thus, the phonology of the word *spray* and the precise manner of motion of the particles or liquid caused to move in a spraying event are unique to this lexical entry. However, much of its syntactic and semantic behavior – it is a regular verb, participating in a locative alternation, involving caused ballistic motion of a liquid or other dispersable material – is shared with other English verbs such as *splash*, *splatter*, *inject*, *squirt*, and *smear*. To the extent that these "narrow conflation classes", as Pinker (1989) terms them, are founded on clear semantic criteria, we can readily state syntactic and semantic constraints on the appropriate types in the relevant type hierarchy. Thus much of the semantics of a verb like *spray* need not be specified at the level of that individual lexical entry. Apart from the broad semantics of caused motion, shared by numerous verbs, the verbs in the narrow conflation class containing *spray* share the selectional restriction, noted above, that their objects are set in motion by an initial impulse and that they are liquid or particulate material. We might therefore posit a subtype of the type

*caused-motion-rel* to represent this shared semantics triggering the locative alternation, with further subtypes for the semantics of the individual verbs. Note that not all these constraints apply to precisely the same class (there are other verbs with somewhat different semantics, like *load* and *wrap*, exhibiting the locative alternation, for example), so several types might be required.

To sum up the import of these brief examples, the substance of the hierarchical lexicon need not be directly expressed in terms of subtypes of *word*, but rather in implicational statements that express constraints among types in the structures inside lexical entries. Interactions among these statements provide a way for classes of lexical items to inherit and share properties, so that they need not specify the same information over and over again.

# **4.3 Cross-cutting types in the lexicon**

Having now illustrated the use of implicational statements to specify constraints on classes of lexical entries at various levels of generality, we present in this section an example of cross-cutting types, each expressing some generalization about a class of words. Drawn from Ackerman & Webelhuth (1998), this sample analysis concerns German passives, which come in several varieties, each with its own constraints. Each passive construction uses a different auxiliary (*werden*, *sein*, or *bekommen*) and two of these constructions require a participial form of the verb, while the *sein* passive requires *zu* followed by an infinitive VP. Additionally, passives appear attributively, as NP modifiers, as well as predicatively. Here are two examples of the *zu* + infinitive passive, the first attributive, the second predicative:

	- b. weil because die the Blumen flowers dem the Mann man von by Johann Johann zu to schenken give sind are 'because the flowers must be given to the man by Johann'

Ackerman & Webelhuth's account of German passives posits a multiple inheritance hierarchy of lexical types (note that these are all subtypes of a type *word*, not subtypes of values within it). A portion of their hierarchy of German passive types is shown in Figure 3. The suffix *-lci* on the names of types in this figure stands for "lexical combinatorial item", which is basically equivalent to lexical entry.

Figure 3: A portion of the hierarchy of passive lexical types according to Ackerman & Webelhuth (1998: 244)

While all passives share the constraint that a logical subject is demoted, as stipulated on a general *univ-pas-bas-lci* passive type, the other requirements for each kind of passive are stated on various subtypes. The *zu*+infinitive passive, for instance, requires not only that *sein* is the auxiliary and that the main verb is infinitive, but that the semantics involves some additional modal meaning. This differs from the other passives, which simply maintain the semantics of their active counterparts. However, the types of the passive verb *schenken(den)* in (26a) and (26b) both inherit from several passive verb supertypes. As mentioned, at a general level, there is information common to all German passives, or indeed to passives universally, namely that the "logical subject" (first element of the basic verb's ARG-ST list) is realized as an oblique complement of the passive verb, or not at all. A very common subtype, which Ackerman & Webelhuth also regard as universal, rather than specific to German, specifies that the base verb's direct object is realized as the subject of its passive counterpart; this defines personal passives. Once in the German-specific realm, an additional subtype specifies that the logical subject, if realized, is the object of a *von*-PP; this holds true of all three types of German personal passives. Among its subtypes is one that requires *zu* and the infinitive form of the verb; moreover, although Ackerman & Webelhuth do not spell this out in detail, this subtype specifies the modal force associated with this passive construction but not the others. Finally, both the predicative and attributive forms are subtypes of all the preceding, but these inherit also

from distinct supertypes for predicative and attributive passives of all kinds. The supertype for predicative passives constrains them to occur with an auxiliary; its subtype for *zu* + infinitive passives further specifies that the auxiliary is *sein*. The attributive passive type, on the other hand, inherits from modifier types generally, which do not allow auxiliaries, but do require agreement in person, number, and case with the modified noun. In summary, the hierarchical lexicon is deployed here to factor out the differing properties of the various German passive constructions, each of which includes its particular combination of properties via multiple inheritance.

# **4.4 Default inheritance in the lexicon**

So far, we have assumed rigid, monotonic inheritance of all information in supertypes to their subtypes; none of the inherited information can be overridden. This runs into difficulties when dealing with lexical entries that appear to be exceptional in some way, the obvious examples being morphological irregularities. How can productive regular forms such as \**childs* be blocked, and only *children* allowed as a lexical entry? Under default unification, although the plural of *child* might inherit the information from the pertinent lexical entry and from the *plural-noun* type, which would entail the phonology for \**childs*, this regular plural form would be overridden.

Several approaches to exceptions and irregularities have been proposed; we will focus first on *default unification*, and examine an alternative involving type underspecification, in the following section. Various complex issues arise in attempting to formulate a workable system of default unification and inheritance. See, e.g., Briscoe & Copestake (1999) for a brief overview of various ways that default unification might be defined. Lascarides & Copestake (1999) list several desirable criteria, including these:


They explore the properties of their system, called YADU, in considerable detail. The intent is to preserve the behavior of non-default unification in cases

where no default information is present, and for defeasible information at a more specific level in the type hierarchy to override defeasible information at a more general level.<sup>10</sup>

We now sketch how YADU functions, using the example of English verb forms in Lascarides & Copestake (1999). The pertinent linguistic facts here are as follows: English past and passive participles are always identical in form, (simple) past tense suffixes are usually the same as the corresponding participles', and the past tense suffix of most verbs is *-ed*. The last two statements are defeasible, while the first is not. In YADU, each type is represented with a nondefeasible typed feature structures, plus a set of defeasible feature structures, each with an associated type. The type hierarchy in Figure 4 provides an example (here, the nondefeasible information comes first, and the set of defeasible structures follows the slash).

$$\begin{array}{l} \left| \begin{array}{l} \textbf{verb} \\ \textbf{PASP} \\ \textbf{PASP} \begin{bmatrix} \boxed{\textbf{2}} \\ \boxed{\textbf{PASP}} \end{bmatrix} \end{array} \right| / \left\langle \left\langle \begin{bmatrix} \textbf{PASP} & \boxed{\textbf{D}} \\ \textbf{PASP} & \boxed{\textbf{D}} \end{bmatrix}, \begin{Bmatrix} \textbf{PASP} & \boxed{\textbf{D}} \\ \textbf{PASP} & \boxed{\textbf{D}} \end{Bmatrix} \right\rangle, \left\langle \begin{bmatrix} \textbf{PASP} & \boxed{\textbf{D}} \\ \textbf{PASP} & \boxed{\textbf{D}} \end{bmatrix} \right\rangle \\\\ \left| \begin{array}{l} \\ \textbf{PASP} \end{array} \right\rangle / \left\langle \left\langle \begin{bmatrix} \textbf{PASP} & \textbf{+} ed \end{bmatrix}, \begin{array}{l} \textbf{regverb} \\ \textbf{PASP} \end{bmatrix} \right\rangle \\\\ \left| \begin{array}{l} \\ \textbf{PASP} \end{array} \right\rangle / \left\langle \left\langle \begin{array}{l} \textbf{PASP} \ \textbf{+} ed \end{array} \right\rangle, \left\langle \textbf{PASP} \right\rangle \right\rangle \\\\ \left| \begin{array}{l} \\ \textbf{PASP} \end{array} \right\rangle / \left\langle \left\langle \begin{array}{l} \textbf{PASP} \ \textbf{+} t \end{array} \right\rangle, \left\langle \textbf{p} st\textbf{-} t \textbf{-} \textbf{vec} \right\rangle \end{array} \right\rangle \end{array} \right\rangle$$

Figure 4: A type hierarchy of "rules" for past forms of English verbs, incorporating nondefault information (to the left of /) and default information (to the right of /), from Lascarides & Copestake (1999: 61)

In Figure 4, the most general type *verb* stipulates the identity of the past participle and passive participle forms as nondefault information. The value of PAST, the simple past tense form, is unspecified, because some English verbs have ir-

<sup>10</sup>As Malouf (2000: 126) states: "Default inheritance as appealed to by, e.g., Sag (1997), is an abbreviatory device that helps simplify the construction of lexical type hierarchies. When used in this way, defaults add nothing essential to the analysis. They simply provide a mechanism for minimizing the number of types required. Any type hierarchy that uses defaults can be converted into an empirically equivalent one that does not use defaults, but is perhaps undesirable for methodological reasons."

#### Anthony R. Davis & Jean-Pierre Koenig

regular past tense forms (the symbol > denotes the most general type, and here indicates merely that nothing more specific can be stated about the PAST form of every English verb). On the right-hand side of the / is default information; this states that normally, the value of PAST is shared with both the values of both participle forms, whether the verb is regular (e.g., *walked*) or irregular (e.g., *understood*), although there are also verbs for which this is not true, such as *give* (*gave*, *given*), which override this default. For regular verbs (type *regverb*), the value of PAST will be, by default, the result of a function that suffixes *-ed* to the verb stem (Lascarides & Copestake gloss over the details of morphology and phonology here), but this is defeasible. In the more specific type *pst-t-verb*, for instance, the default *-ed* is overridden by (again default) information that the suffix is *-t*.

Thus a *pst-t-verb* like *burn*/*burnt* inherits the nondefault information from *regverb* and *verb*, but overrides the regular past forms. The default information in *pst-t-verb* is associated with a more specific type than that in *regverb*, so it takes precedence in YADU's unification procedure. And as Lascarides & Copestake note (p. 62): "This is the reason for separating *verb* and *regverb* in the example above, since we want the *+t* value to override the *+ed* information on *regverb* while leaving intact the default reentrancy which was specified on *verb*. If we had not done this, there would have been a conflict between the defaults that was not resolved by priority." For morphological irregularities such as *children*, the same devices can be used, with a type for the lexical entry of *child* that overrides the regular plural form.

As an example of the use of default, nonmonotonic inheritance outside of morphology, consider the account of the syntax of gerunds in various languages developed by Malouf (2000). Gerunds exhibit both verbal and nominal characteristics, and furnish a well-known example of seemingly graded category membership, which does not accord well with the categorical assumptions of mainstream syntactic frameworks. Roughly speaking, English gerunds, and their counterparts in other languages, act much like verbs in their "internal" syntax, allowing direct objects and adverbial modifiers, but function distributionally ("externally") as NPs. To take but a couple of pieces of evidence (see Malouf 2000: 27–33 for more details), gerunds can be the complement of prepositions, whereas finite clauses cannot (as in (27)); however, adverbs, not adjectives, can modify gerunds, while adjectives must be used to modify deverbal nouns (as in (28)).

	- b. \* Pat is concerned about (that) Sandy got arrested.

	- b. Pat disapproved of my quiet/\*quietly departure.

One approach to modeling these distinctions is directly, via syntactic rules that allow an NP to be expanded as a constituent internally headed by a verb. As Malouf notes, this offers no account of the observed behavior of gerund-like forms across languages. Some possible combinations of noun-like and verb-like attributes are frequently attested cross-linguistically in gerunds and their equivalents, while others are rare or unattested. Cross-linguistically, gerunds vary in their subcategorization possibilities: some allow subjects and complements, while some allow only complements and no subjects. But there appear to be no cases of gerund-like lexical items that can take a subject but cannot take complements.

Instead of such unmotivated syntactic rules, Malouf posits a lexical rule, which converts the lexical category of a verb to *noun*, but otherwise preserves its verbal properties, such as subcategorization. With strictly monotonic inheritance, this poses problems, as it would force us to abandon useful generalizations about nouns other than gerunds (e.g., they do not take direct object complements, as many verbs and their gerunds do). Default inheritance provides one way to model the observed phenomena, without weakening the constraints on parts of speech to the point where no meaningful constraints distinguish them.

In Malouf's account, there are both "hard" constraints – a verb lexical entry, for example, must have a HEAD value of type *relational* (encompassing verbs, adjectives, and adpositions) – and "soft," overridable constraints – a verb lexical entry by default has a HEAD value of type *verb*. In addition, following Bouma et al. (2001), he posits the types *ext-subj* and *ext-spr*. The former constrains the HEAD value to *relational* and the first element of the ARG-ST list to be the SUBJ (only adjectives, adpositions, verbs, and predicative NPs have subjects), while the latter constrains the HEAD value to *noun* and the first element of the ARG-ST list to be the SPR (only nouns have specifiers), as shown in (29).


Malouf then specifies default HEAD values for the lexical classes *n* and *v* (see (30) for the latter's definition). As gerunds have both properties of nominal and

relational heads, they are subtypes of both, as shown in the multiple inheritance hierarchy in Figure 5. The *v* type, which concerns us here, has a default HEAD value *verb*, as shown in (30) in addition to the non-default, more general type *relational* it also includes (default information follows /).

Figure 5: A cross-cutting hierarchy of types of *head* according to Malouf (2000: 65)

(30) *v* HEAD *relational /verb*

However, in the subtype of *v* called *vger*, the default value *verb* is overridden. In *vger*, the HEAD value is of the type *gerund*, which is a subtype of both *noun* and *relational*, but not of *verb*. The type *vger* is shown in (31); where *f-ing* is a function that produces the *-ing* form of an English verb from its root.

$$\begin{array}{c} \begin{bmatrix} \text{vger} \\ \text{MORPH} \\ \begin{bmatrix} \text{I-FORM} \\ \text{I-FORM} \end{bmatrix} \begin{bmatrix} \text{II} \\ \text{I-FORM} \end{bmatrix} \end{bmatrix} \end{array} $$

The type *vger* is thus compatible with "verb-like" characteristics. But, as its HEAD is also a subtype of *noun*, its SUBJ list is empty and the first element on its ARG-ST list is its SPR value. In addition, gerunds allow direct complements (unlike ordinary nouns), but not subjects (unlike ordinary verbs). Malouf's hierarchy of types makes this prediction, in effect, because the *ext-spr* type requires that the "external argument" (the first on the ARG-ST list) is realized as the value of SPR.

While it would be possible to construct type hierarchies of lexical types, HEAD types, and so on that would allow for "anti-gerunds" – those that would act externally as nouns, allow subjects, but not permit direct complements – this would require reorganizing these type hierarchies to a considerable extent. Given that

many nouns besides gerunds – nominalizations, for example – are relational it could be difficult to model a hypothetical language that permits only the antigerunds rather than the normal ones.

Malouf further notes a key difference between gerunds and exceptions like \**childs*/*children*: English gerunds are productive (and completely regular morphologically). If the same mechanisms of default unification are involved in both, what accounts for this difference? His answer is that productive and predictable processes involve on-line type construction (see Section 5.3 for details). The irregular form *children* must of course be learned and stored, not generated online. The default mechanisms described above, however, are employed at higher levels of the lexical hierarchy, and the individual gerunds forms *are* productively generated online. Note that, in contrast to the morphological and syntactic consistency among gerunds, English nominalizations display some idiosyncrasies that suggest at least some of them must be stored as distinct lexical items. Thus, as Malouf emphasizes, modeling prototypicality in the lexicon within HPSG can draw on both default inheritance and on-line type construction; together, they make "the connection between prototypicality, and productivity" (p. 127).

# **5 Lexical rules**

In this section we describe the role lexical rules play in HPSG as well as their formal nature, i.e., how they model "horizontal" relations among elements of the lexicon. These are relations between variants of a single entry (be they subcategorizational or inflectional variants) or between members of a morphological family, as opposed to the "vertical" relations modeled through inheritance. Thus they provide a means to represent the intuitive notion of "derivation" of one lexeme from another.

# **5.1 What is the nature of lexical rules in HPSG?**

While lexical rules or similar devices have been invoked within HPSG since its inception, formalizing their nature and behavior still continues. The intent, however, has always been, as Lahm (2016) stresses, to treat lexical rules (typically written ↦→ ) to mean that for every lexeme or word described by there is one described by that has as much in common with A as possible.

Copestake & Briscoe (1992), Briscoe & Copestake (1999), Meurers (2001), and many others formalize the notion of lexical rule within HPSG by introducing a type, say *lex-rule*, with the attributes IN and OUT, whose values are respectively the rule's input and output lexical descriptions. As Briscoe & Copestake (1999)

#### Anthony R. Davis & Jean-Pierre Koenig

note, lexical rules of this form also bear a close relationship to default unification. The information in the input is intended to carry over to the output by default, except where the rule specifies otherwise and overrides this information. But, as Lahm (2016) points out, a sound basis for the formal details of how lexical rules work is not easily formulated. Meurers' careful analysis of how to apply lexical rules to map a description into the description does not always work as intended, in that what we would expect to be licit inputs are not always actually such, and no output description results as a consequence. Fortunately, it is not clear that this is a severe problem in practice, and Lahm notes that he has not found an example of practical import where Meurers' lexical rule formulation would encounter the problems he raises.

In a slight variant of the representation of lexical rules proposed by Copestake & Briscoe and Meurers, the OUT attribute can be dispensed with; the information in the lexical rule type that is not within the IN value then constitutes the output of the rule. Ackerman & Webelhuth (1998: 87) employ this style of representation; their type *derived-lci* adds a LEXDTR attribute (equivalent to IN) that contains the input lexical entry's information. The difference between the two representations with only the attributes SYNSEM and PHON included for expository purposes is shown in (32).

$$\begin{array}{ll} \text{(32)} & \text{a.} \begin{bmatrix} \text{IN} & \begin{bmatrix} \text{PHON} & a\\ \text{SYNSEM} & b \end{bmatrix} \\ \text{OUT} & \begin{bmatrix} \text{PHON} & c\\ \text{SYNSEM} & d \end{bmatrix} \end{array} \end{array}$$

$$\begin{array}{ll} \text{b.} & \begin{bmatrix} \text{PHON} & c\\ \text{SYNSEM} & d \end{bmatrix} \\ \text{IN} & \begin{bmatrix} \text{PHON} & a\\ \text{SYNSEM} & b \end{bmatrix} \end{array}$$

In the variant in (32b), lexical rules are treated as subtypes of a *derived-lexicalsign* type, which can combine with other types in the lexical hierarchy, merely adding the derivational source via the IN value. Formulated in either fashion, lexical rules are essentially equivalent to unary syntactic rules, with the IN attribute corresponding to the daughter and the OUT attribute to the mother (or the rest of the information in the rule, if the OUT attribute is done away with). This is the way lexical rules are implemented in the English Resource Grammar (see http://www.delph-in.net/erg/ for demos and details about this large-scale implemented grammar of English) as well in the CoreGram Project and the Grammix grammar development environment (see Müller 2007 and https://hpsg.huberlin.de/Software/Grammix/ for details on the Grammix software). See also

Bender & Emerson (2021: Section 3), Chapter 25 of this volume for remarks on implementations.

One clear advantage of this kind of representation, i.e., a representation in which the attribute OUT is dispensed with and lexical "rules" are simply subtypes of *derived-word* or *derived-lexeme*, is that they are then positioned in the lexical hierarchy and subject to the same implicational constraints as other classes of words. They can also be organized in complex networks of more or less general rules. As Riehemann (1998) and Koenig (1999) show, if one includes in the lexical hierarchy unary-branching rules to model derivational morphology, a unified account of derivational processes that apply both productively to an open-ended set of lexemes as well as unproductively to another closed set of lexemes becomes possible. Consider the approach to derivational morphology taken by Riehemann (1998). Example (33) (Riehemann's (1)) illustrates *-bar* suffixation in German, a process by which an adjective that includes a modal component can be derived from verb stems (similar to English *-able* suffixation). A lexical rule approach could posit a verb stem input and derive an adjective output. As Riehemann stresses, though, there are many different subtypes of *-bar* suffixation, some productive, some unproductive, all sharing some information. This combination of productive and unproductive variants of a lexical process is exactly what the type hierarchy is meant to capture and what Riehemann's *Type-Based Derivational Morphology* capitalizes on. The structure in (34) presents the relevant information about Riehemann's type for regular *-bar* '-able' adjectives (see Riehemann 1998: 68 for more details). Critically, *-bar* adjectives include a singleton-list base (the value of MORPH-B) that records the information of the adjective's verbal base (corresponding to the would-be lexical rule's input). Because of this extra layer, the local information in the base (the *local* object under MORPH-B … LOCAL) and the *-bar* adjective (the *local* object under SYNSEM|LOCAL) can differ without being in conflict.

(33) Sie they bemerken notice die the Veränderung. change Die the Veränderung change ist is bemerkbar. noticeable 'They notice the change. The change is noticeable.'

$$\begin{array}{c} \begin{bmatrix} \text{reg-bar-adj} \\ \text{PHON} & \boxed{\Box} \oplus \langle \mathit{ bar} \rangle \\ \\ \text{MORPH-B} & \left\langle \begin{bmatrix} \text{trans-verb} \\ \text{PHON} & \boxed{\Box} \\ \text{LOCAL} & \text{local} \end{bmatrix} \right\rangle \\ \end{bmatrix} \end{array}$$

#### Anthony R. Davis & Jean-Pierre Koenig

See also the chapters by Crysmann (2021: Section 2.2) and Müller (2021d: Section 3) for further discussion of Riehemann's proposal.

# **5.2 Phenomena accounted for by lexical rules**

Lexical rules have been put to many uses: derivational and inflectional morphology (Copestake & Briscoe 1995; see Emerson & Copestake 2015 for an alternative approach to inflection in HPSG that is morpheme-based), conversion in interaction with complex predicate formation (Müller 2010), negation (Kim & Sag 2002; Müller 2010), and diathesis alternations (Briscoe & Copestake 1999; Müller 2003; 2018a; Davis 2001). Moreover, proposals for lexical rules in HPSG have extended beyond what are traditionally or evidently viewed as lexical phenomena, to include treatments of affixal realization of arguments, extraction, unbounded dependencies, and adjuncts (Monachesi 1993; Pollard & Sag 1994: 378; van Noord & Bouma 1994; Keller 1995; Miller & Sag 1997). In this section, we describe the use of lexical rules to model the realization of arguments as extracted dependents or affixes, rather than complements. We concentrate on two of these cases (affixal realization of arguments and complement extraction), which we will contrast with alternative analyses not involving lexical rules presented by the same authors (see the next section). They thus provide a good illustration of some of the analytical choices available to model relations between variant lexical entries based on a single stem.

We begin with the Complement Extraction Lexical Rule (hereafter, CELR) proposed in Pollard & Sag (1994: 378), shown in (35). The input to the rule is any lexeme that selects for a syntactic argument ( 3 ) that the lexeme requires to be expressed as a complement (as indicated, this syntactic argument is also a member of the COMPS list). The output stipulates that this same syntactic argument is no longer a member of the COMPS list; however, the SLASH set now includes a new element, which is the local information of this syntactic argument ( 1 ). Informally stated, the input entry specifies that a syntactic argument must be realized as a complement, whereas the output entry specifies that the same syntactic argument must be realized by a non-local dependent (see Pollard & Sag 1994: Chapter 4 for the distinction between LOCAL and NONLOCAL information).

(35) Complement Extraction Lexical Rule (adapted from Pollard & Sag 1994: 378):

$$\begin{bmatrix} \text{ARG-ST} \left< \ldots, \boxed{\Box} \ldots \right> \\ \text{COMPS} \left< \ldots, \boxed{\Box} \left[ \text{Loc} \left[ \boxed{\Box} \right] , \ldots \right> \\ \text{SLASH} \left< \boxed{\Box} \right> \end{bmatrix} \mapsto \begin{bmatrix} \text{ARG-ST} \left< \ldots, \boxed{\Box} \left[ \begin{smallmatrix} \text{LOC} \left[ \boxed{\Box} \ \\ \text{SLASH} \left< \boxed{\Box} \right> \end{smallmatrix} \right] , \ldots \\ \text{SLASH} \left< \boxed{\Box} \right> \end{bmatrix} \end{bmatrix} \mapsto \begin{bmatrix} \text{ARG-ST} \left< \boxed{\Box} \ \boxed{\Box} \ \boxed{\Box} \ \boxed{\Box} \end{bmatrix} \begin{bmatrix} \boxed{\Box} \ \end{bmatrix} \end{bmatrix}$$

A similar use of lexical rules to model alternative realizations of arguments can be found in Monachesi (1993), who analyzes alternations between complements and pronominal object affixes (traditionally called object clitics) in Italian in a way that parallels the French examples in (8). in the rule, shown in (36), a.k.a. the "shuffle" operation, stands for the unordered concatenation of two lists, since any member of the input's COMPS list can be realized as a clitic and therefore not be included in the output's COMPS list (see Müller (2021b: 391), Chapter 10 of this volume for a more formal explanation of ). In the output of the lexical rule in (36), a subset of the list of complements in the input ( 2 ) corresponds to a list of clitic *synsem*s, realized as prefixes through inflectional rules not shown here.

(36) Clitic Lexical Rule adapted from Monachesi (1993: 439):


Here as well, a lexical rule is employed in an analysis of what might well be considered a syntactic phenomenon. The possibility of treating phenomena like extraction and pronominal object affix placement at a lexical level, however, makes sense when they are considered fundamentally as matters of the combinatorial requirements of predicators, rather than effects of movement.

Before turning to the alternatives, we note in passing that lexical rules are inherently "directional", with an input and an output. This seems intuitively correct in the cases we have discussed, but might not always be so. Is there inherent directionality, for example, between the causative and inchoative alternants of verbs such as *melt* or *slide* or between the ditransitive and prepositional object frames of verbs such as *give*, as Goldberg (1991: 731) or Goldberg (1995: 18–23) ask? The alternatives to lexical rules described in the following section lack this notion of directionality.

# **5.3 Alternatives to lexical rules**

In this section we briefly examine two alternatives to lexical rules, each involving underspecification. The types of members of the ARG-ST list might be underspecified so that a single lexical description can correspond to more than one subcategorization frame. Or the type of the entry itself may be underspecified, so that it subsumes multiple inflectional or derivational forms. In both cases, the intent is that sufficiently underspecified information covers multiple entries that would otherwise have to be specified and related by lexical rules. We begin with

#### Anthony R. Davis & Jean-Pierre Koenig

alternatives to the complement extraction and clitic lexical rules in (35) and (36), proposed in Bouma et al. (2001) and Miller & Sag (1997).

In both cases, the idea is to distinguish between "canonical" and "non-canonical" realizations of syntactic arguments, as shown in the hierarchy of *synsem* types in Figure 6. "Canonical" means local realization as a complement or subject/specifier, and "non-canonical" means realization as an affix or filler of an unbounded dependency. Linking constraints between semantic roles (values of argument positions) and syntactic arguments (members of ARG-ST) do not specify whether the realization is canonical or not; thus they retain their original form. Only canonical members of ARG-ST must be structure-shared with members of valence lists. The two constraints that determine the non-canonical realization of fillers are shown in (37) and (38). (37) specifies what it means to be a *gap-ss*, namely that the argument is extracted (its local information is "slashed") whereas (38) prohibits any *gap-ss* member from being a member of the COMPS list (see Bouma et al. 2001: 23). As these two constraints are compatible with either a canonical or extracted object, there is no need for the lexical rule in (35). (DEPS in (38) is an attribute Bouma et al. introduce that includes not only syntactic arguments, the value of ARG-ST, but also some syntactic adjuncts; stands for list subtraction)

Figure 6: Subtypes of *synsem*

$$\begin{aligned} \text{(37)} \quad ⪆\text{-ss} \Rightarrow \begin{bmatrix} \text{Loc} & \boxed{\Box} \\ \text{SLASH} & \{\boxed{\Box}\} \end{bmatrix} \\ \text{(38)} \quad &\omega word \Rightarrow \begin{bmatrix} \text{SUBJ} & \boxed{\Box} \\ \text{comps } \boxed{\Box} \ominus list(gap\text{-ss}) \\ \text{DEPS} & \{\boxed{\Box}\} \oplus \boxed{\Box} \end{bmatrix} \end{aligned}$$

Miller & Sag (1997) make a similar use of non-canonical relations between the ARG-ST list and the valence lists, eschewing lexical rules to model French pronominal object affixes (traditionally called clitics) and proposing instead the constraint on the type *cl-wd* (the type for verbs that include object affixes in (39),

 

where a subset of ARG-ST members, those that are realized as affixes (of type *aff* ), are not also selected as complements.

(39) Constraints on words with clitics adapted from Miller & Sag (1997: 587):

$$\begin{bmatrix} \begin{bmatrix} \begin{array}{l} \text{FORM} & F\_{\text{PRA}F}(\box{\Box}, \dots) \\ \text{I-FORM} \box{\Box} \end{array} \end{bmatrix} \\\\ \begin{bmatrix} \begin{array}{l} \text{H2O} \begin{bmatrix} \text{HEAD} & \text{verb} \\ \text{SUBJ} & \begin{bmatrix} \text{2} \\ \text{COMPS} \end{bmatrix} \end{bmatrix} \\\\ \begin{bmatrix} \text{H2O-ST} & \begin{bmatrix} \text{2} \\ \text{CHS-ST} & \begin{bmatrix} \text{2} \end{bmatrix} \text{list(non-aff)} \\ \text{ARG-ST} & \begin{bmatrix} \text{2} \oplus \box{\Box} \end{bmatrix} \big{ } \circ \text{nelist(aff)} \end{bmatrix} \end{bmatrix} \end{bmatrix} \end{bmatrix}$$

 In both of these analyses, related sets of lexical entries that could be thought of as "generated by lexical rules" are instead regarded as the various possible ways of obeying constraints like those in (37) or (39). This comes at a cost of additional types and constraints for extraction, and a loosening of requirements for the correspondence between the ARG-ST list and the valence lists. However, these approaches, in dispensing with lexical rules, sidestep both the conceptual and representational issues that we noted earlier and attempts to restrict lexical rules to cases where they cannot be avoided, e.g., derivational morphology.

The second alternative to lexical rules based on underspecification was presented in Koenig & Jurafsky (1995) and Koenig (1999). Typically in HPSG, all possible combinations of types are reified in the type hierarchy (in fact, they must be present, per the requirement that the hierarchy be sort-resolved: Carpenter 1992b, Pollard & Sag 1994), or, equivalently, that each linguistic entity be assigned exactly one maximally specific type – a.k.a. *species* (Richter 2004: 78; Richter 2021: Section 2, Chapter 3 of this volume). Thus, if one partitions verb lexemes into transitive and intransitive and, orthogonally, into, say, finite verbs and gerunds (limiting ourselves to two dimensions here for simplicity), the type hierarchy must also contain the combinations transitive+finite, transitive+gerund, intransitive+finite, and intransitive+gerund. Naturally, this kind of fully enumerated type system is unsatisfying. For one thing, there is no additional information that the combination subtype *transitive+finite* carries that is not present in its two supertypes *transitive* and *finite*, and similarly for the other combinations. In contrast to the "ordinary" types, posited to represent information shared by classes of lexemes, these combinations seem to have no other purpose than to satisfy a formal requirement on the mathematical structure of a type hierarchy (namely, that it forms a lattice under meet and join). Second, and related to the first point, this completely elaborated type hierarchy is redundant. Once you know that all verbs fall into two valence classes, transitive and

#### Anthony R. Davis & Jean-Pierre Koenig

intransitive, and simultaneously into two inflectional classes, finite and gerund, and that valence and inflection are two orthogonal dimensions of classification of verbs, you know all you need to know; the type of any verb can be completely predicted from these two orthogonal dimensions of classification and standard propositional calculus inferences.<sup>11</sup>

Figure 7 is a simplified hierarchy of verb lexemes we use for strictly expository purposes, where the boxed labels in small caps VFORM and ARG-ST are mnemonic names of orthogonal dimensions of classification of subcategories of verbs (and are not themselves labels of subcategories). Inheritance links to the predictable subtypes are dashed and their names grayed out; this indicates that these types can be inferred, and need not be declared explicitly as part of the grammar. A grammar of English would include statements to the effect that head information about verbs includes a classification of verbs into finite or base forms (of course, there would be more types of verb forms in a realistic grammar of English) as well as a classification into intransitive and transitive verbs (again, a realistic grammar would include many more types).

Figure 7: An example of on-line type construction

Crysmann & Bonami (2016) have shown how this *online type construction*, where predictable combinations of types of orthogonal dimensions of classification are not reified in the grammar, is useful when modeling productive inflectional morphology. Consider, for example, exponents of morphosyntactic features whose shape remains constant, but whose position within a word's tem-

<sup>11</sup>One possible way of making formally explicit the idea behind on-line type construction within the model-theoretic approach to HPSG that is now standard (King 1989; Richter 2004; 2021) is to allow maximally specific sorts, or species, to be either sets of species or non-atomic sums of species, just in cases where orthogonal dimensions of classification have been used since Flickinger (1987). For reasons of space, we do not pursue this line of inquiry in this chapter.

plate (to speak informally here) varies. One case like this is the subject and object markers of Swahili, which can occur in multiple slots in the Swahili verb template (Stump 1993; Bonami & Crysmann 2016).

For reasons of space we illustrate the usefulness of this dynamic approach to type creation, the Type Underspecified Hierarchical Lexicon (TUHL), with an example from Koenig (1999): the cross-cutting classification of syntactic/semantic information and stem form in the entry for the French verb *aller* (see Bonami & Boyé 2002 for a much more thorough discussion of French stem allomorphy along similar lines; Crysmann & Bonami's much more developed approach to stem allomorphy would model the same phenomena differently and we use Koenig's simplified presentation for expository purposes only). The forms of *aller* are based on four different suppletive stems: *all-* (1st and 2nd person plural of the indicative and imperative present, infinitive, past participle, and imperfective past), *i-* (future and conditional), *v-* (1st, 2nd, or 3rd person singular and 3rd person plural of the indicative present), and *aill-* (subjunctive present). These four suppletive stems are shared by all entries (i.e., senses) of the lexeme *aller*: the one which means 'to fit' as well as the one which means 'to leave', as shown in (40) (see Koenig 1999: 40–41). The cross-cutting generalizations over lexemes and stems are represented in Figure 8. Any *aller* stem combines one entry and one stem form. In a traditional HPSG type hierarchy, each combination of types (grayed out in Figure 8), would have to be stipulated. In a TUHL, these combinations can be dynamically created when an instance of *aller* needs to be produced or comprehended.

	- Marc 3.REFL.of.it go.FUT.3SG 'Marc will leave.'
	- c. Ce this costume suit te you va go.PRS.3SG bien. well 'This suit becomes you.' (lit. goes well to you)
	- d. Il it faut must que that j'y I.to.there aille. go.SUBJ.PRS.1SG 'I must go there.'

Both the distinction between canonical and non-canonical*synsem* and type underspecification avoid conflict between the information specified in the variants

Anthony R. Davis & Jean-Pierre Koenig

Figure 8: A hierarchy of lexical entries and stem-forms for the French verb *aller*, adapted from Koenig (1999: 137)

of words based on a single lexeme (e.g., conflicts on how syntactic arguments are realized); they abstract over the relevant pieces of conflicting information. Underspecifying information included in lexical entries or lexical types allows a single entry or type to stand for the two distinct entries or types that would be related as input and output by lexical rules.

Lexical rules have played a crucial role in the rise of lexicalist approaches to syntax. But the two alternative analytical tools we discussed in this section (which, of course, can be combined in an analysis) have chipped away at their use in HPSG. Inflectional morphology is now dealt with through lexical types associating morphosyntactic features with forms/positions and constraints on words (ensuring that all morphosyntactic features are realized, see Crysmann (2021: 968), Chapter 21 of this volume). Non-canonical realization of syntactic

arguments as affixes or fillers in unbounded dependencies is modeled by many (but see Levine & Hukari 2006, among others, for an opposing view) by distinguishing kinds of members of the ARG-ST list and constraints on words that relate valence, argument structure, and dependents lists.<sup>12</sup>

So, what remains of the case for lexical rules? Well, first, as we showed above, lexical rules are now simply unary-branching rules within the lexical part of the type hierarchy. As such they are not formally distinct from the rest of the lexical hierarchy or the hierarchy of signs, as they used to be. Second, they are not meant to model just unproductive processes, as they were originally intended to in Jackendoff (1975) and Bochner (1993). They can be used to model unproductive processes, but they can also model productive derivational processes (in fact both when a single derivational process is both; see Riehemann 1998 and the discussion of her approach in the chapters by Crysmann (2021: Section 2.2) and Müller (2021d: Section 3)).

Still, the existence of two distinct ways of dealing with potential conflict of information – underspecification or unary branching rules – raises the issue of which one should be used when. Unfortunately, there is no general guideline; it depends on the nature of the data that needs to be modeled. Müller (2006; 2010) argues that diathesis phenomena, broadly speaking, favor a lexical rules approach over a phrase-structural constructional approach à la Goldberg (1995) or an online type construction approach suggested in Kay (2002). The arguments are convincing, but it should be noted that some of the data involves derivational morphology (e.g., causatives) or passive morphemes, which involves a Type-Based Derivational Morphology of the kind Riehemann (1998) argues for (such an approach was suggested in Koenig 1999: Chapter 4). What remains unclear to us is whether there are instances where lexical rules as unary-branching rules are a better model of "horizontal" generalizations that do not involve morphological processes, i.e., whether the kind of lexical rules Pollard & Sag (1994) proposes (e.g., the Complement Extraction Lexical Rule) are ever motivated over the underspecification treatment of such phenomena proposed in Bouma et al. (2001).

# **6 Conclusion**

Our principal goals in this chapter have been to present the HPSG viewpoint on the structure and content of individual lexical entries, and the organization of the

<sup>12</sup>But see Levine & Hukari (2006), Müller (2015), Müller & Machicao y Priemer (2019: Section 4.9) and Müller (2021c) for trace-based approaches.

lexicon as a whole. Unsurprisingly, both of these are pervaded by HPSG's lexicalist stance. With regard to lexical entries, this entails informationally rich and sometimes complex representations. A lexical entry models not only a word's idiosyncratic properties, but also its general morphological, distributional, combinatorial, and semantic characteristics. Consequently, HPSG researchers have devoted a great deal of attention to representing all of these in a parsimonious way, so as to avoid massive redundancy in the lexicon. We have surveyed several techniques addressing how to parcel out information shared among entries into descriptions that are true of sets of entries. First, feature geometry plays a key role in organizing portions of this information within a lexical entry in "packages" that tend to recur throughout the lexicon. This in turn allows these recurring portions to be associated with types in a hierarchy. Through inheritance, these common elements can be stated in just one location for the class of words that share them, and multiple inheritance makes it possible to represent numerous cross-cutting classifications of words. We have shown two ways in which HPSG scholars have exploited these mechanisms. One is by creating a hierarchy of subtypes of *word* and/or *lexeme*, each with associated constraints. The other, probably more commonly employed in current work, is to posit type hierarchies of various objects within lexical entries, along with implicational statements that constrain the content of a lexical entry containing those types of objects.

This hierarchical character of the HPSG lexicon serves to model the "vertical" relationships among classes of words, based on properties like part of speech, subcategorization, linking, morphological and paradigmatic classes, and so forth. There is also a "horizontal" aspect of lexical relations, however, for which lexical rules explicitly relating one class of lexemes or words to another have been proposed. While their original use was primarily to model systematic sets of, say, forms in an inflectional paradigm, HPSG's lexicalist approach to syntax has also seen them employed in accounts of phenomena such as extraction, traditionally regarded as outside the lexicon. We also presented two alternatives to lexical rules that appear to handle these phenomena equally well. One involves underspecification within lexical entries in a way that permits them to describe the right range of related forms, while the other allows underspecification within type hierarchies, and requires fully specified types to be constructed "online". Both of these alternatives, like lexical rules, avoid massively repetitive specification of properties of families of systematically related words. Lexical rules as well as the two alternatives we outlined are independently needed and, although one can make suggestive remarks as to when to use lexical rules or either alternative, the issue cannot be settled *a priori* and must be argued on a case by case basis.

But, the rich and intricate hierarchical lexicon cum lexical rules is a defining, enduring, and pervasive feature of HPSG, more prominent here than in almost any other grammatical framework.

# **Acknowledgments**

We thank Anne Abeillé for very helpful comments and Stefan Müller for so carefully reading and commenting on several versions of the manuscript and helping us improve this chapter considerably. We thank Elizabeth Pankratz for editorial comments and proofreading.

# **References**






#### Anthony R. Davis & Jean-Pierre Koenig




*Conference on Head-Driven Phrase Structure Grammar: University of California, Berkeley, 22–23 July, 2000*, 267–284. Stanford, CA: CSLI Publications. http:// csli-publications.stanford.edu/HPSG/2000/ (29 January, 2020).


Roberts, Taylor. 2000. *Clitics and agreement*. MIT. (Doctoral dissertation).



# **Chapter 5**

# **HPSG in understudied languages**

# Douglas L. Ball

Truman State University

Work within HPSG has explored typologically-different and genetically-diverse languages, though the framework is not well-known for such explorations. This chapter details some of that work, focusing on the phenomena of argument indexing (pronoun incorporation or agreement), non-accusative alignment, and VSO constituent order. Examination of proposed and possible analyses within these areas reveals that HPSG can flexibly handle a wide range of languages all while maintaining a certain uniform "underlying structure" within the analyses.

# **1 Introduction**

To date, the most intensely studied language within the HPSG framework has been English; this follows the trend in modern syntactic theorizing at large: English is currently the best described language in the world. Still, there has been plenty of work within HPSG on languages other than English (ISO 639-3 code: eng); in fact, substantial work has occurred within the framework<sup>1</sup> on German (ISO: deu; Crysmann 2003; Müller 2013), Danish (ISO: dan; Müller & Ørsnes 2015), Norwegian (ISO: nor; Hellan & Haugereid 2003), French (ISO: fra; Abeillé & Godard 2000; 2002; 2004; Abeillé et al. 2006), Spanish (ISO: spa; Marimon 2013), Portuguese (ISO: por; Costa & Branco 2010), Mandarin Chinese (ISO: cmn; Müller & Lipenkova 2013; Yang & Flickinger 2014), Japanese (ISO: jpn; Siegel, Bender & Bond 2016), and Korean (ISO: kor; Kim, Yang, Song & Bond 2011; Kim 2016), Persian (ISO: fas; Taghvaipour 2004; 2005a,b; 2010; Müller 2010; Müller & Ghayoomi 2010; Bonami & Samvelian 2009; Samvelian & Tseng 2010), among others.

<sup>1</sup>Citations in this paragraph are to works, if available, whose focus is on the entire morphosyntax of the language in question rather than on particular issues in these languages.

Douglas L. Ball. 2021. HPSG in understudied languages. In Stefan Müller, Anne Abeillé, Robert D. Borsley & Jean- Pierre Koenig (eds.), *Head-Driven Phrase Structure Grammar: The handbook*, 177–216. Berlin: Language Science Press. DOI: 10.5281/zenodo.5599826

#### Douglas L. Ball

However, work within HPSG is not particularly well-known for exploring a wide range of typologically- and genetically-diverse languages, certainly not to the degree of its constraint-based lexicalist cousin, Lexical Functional Grammar. Nevertheless, there has been work within HPSG on such languages; this chapter will discuss some of this work as well as suggesting some further avenues for HPSG work within these languages.

The term I will employ for these typologically- and genetically-diverse languages is *understudied languages*. Which languages qualify as understudied languages, though? Does the term just cover languages that have not previously been investigated, syntactically? Or maybe all the languages without any previous HPSG work? Or maybe the term encompasses any language that is not the most described language, English? Though I reject all these (somewhat jocular) definitions, I do grant that *understudied language* is surely a fuzzy category, with boundaries that are difficult to demarcate and with conditions for inclusion that could be controversial. As a working benchmark for this chapter, I will suppose that the term *understudied languages* includes those languages that have a combined native and non-native speaker population of 1.2 million or fewer (roughly 0.01% of the world's population at present), that are spoken currently or have gone extinct within the last 120 years, that are generally spoken in a smaller, contiguous part of the globe, and that are not usually employed in international diplomacy or commerce. With this benchmark, languages<sup>2</sup> like Tongan (Polynesian, Austronesian; Tonga; ISO: ton), Kimaragang (Dusunic, Austronesian; Sabah, Malaysia; ISO: kqr), Warlpiri (Ngumpin-Yapa, Pama-Nyungan; west central Northern Territory, Australia; ISO: wbp), Burushaski (isolate; Gilgit-Baltistan, Pakistan; ISO: bsk), Lezgian (Lezgic, Nakh-Dagestanian; southern Dagestan, Russia; ISO: lez), Maltese (Semitic, Afro-Asiatic; Malta; ISO: mlt), Basque (isolate; Basque Country, Spain & France; ISO: eus), Welsh (Celtic, Indo-European; Wales, UK; ISO: cym), Oneida (Iroquoian; New York & Wisconsin, USA; Ontario, Canada; ISO: one), Coast Tsimshian (Tsimshianic; NW British Columbia, Canada & SE Alaska, USA; ISO: tsi), Yucatec Maya (Mayan; Yucatan Peninsula, Mexico & Belize; ISO: yua), and Macushi (Cariban; Roraima, Brazil, E Venuzuela, & SE Guyana; ISO: mbc) would all be included, while the eleven languages mentioned in the first paragraph would not.<sup>3</sup>

<sup>2</sup>The locations and genetic affiliations of the languages listed here were checked at Hammarström et al. (2018).

<sup>3</sup>Some of the languages listed above will be discussed further in this chapter. Others from the above list are well-known understudied languages from the linguistics literature. A few of these languages have HPSG work that is not mentioned elsewhere in this chapter: on Tongan, see also Dukes (2001); on Warlpiri, see Donohue & Sag (1999); on Maltese, see Müller (2009); on Basque, see also Crowgey & Bender (2011); on Oneida, see also Koenig & Michelson (2010); on Yucatec Maya, see Dąbkowski (2017).

#### 5 HPSG in understudied languages

The denotation of the term *understudied languages* is intended to be different from *endangered language*; there is nothing in the above supposition of what an understudied language is that says that an understudied language may (or may not) potentially cease to be spoken within the next five to seventy years (see discussion in Krauss 1992 and Simons & Lewis 2013 for more on endangered languages and the crisis they face). However, the two terms do, in actuality, overlap: many understudied languages are endangered languages. While the use of HPSG (or other formal syntactic frameworks) has no direct bearing on the continued viability of a particular language, the practitioners of HPSG join with other linguists in seeing the importance of documenting such languages and supporting the rights of communities of endangered languages to continue to speak these languages.

Understudied languages exhibit a great variety of syntactic behaviors – some of them quite similar to "well-studied languages", some of them quite different – and these languages do not form an obvious natural class, syntactically. Due to space limitations, I will focus on just a very small portion of the syntactic phenomena of understudied languages: argument indexing, non-accusative alignment (chiefly ergativity), and VSO constituent order. These phenomena and their analyses will give the reader a sense of how HPSG has been or could be applied to understudied languages. Unfortunately, this means that a collection of phenomena made famous by understudied languages – including, among others, noun incorporation (but see Malouf 1999; Runner & Aranovich 2003; Ball 2005a,b; 2008), serial verbs (but see Muansuwan 2001; 2002; Kropp Dakubu et al. 2007; Müller & Lipenkova 2009; Lee 2014), clause-chaining, evidentiality systems (but see Lee 2012), object-initial word order, and applicatives (but see Runner & Aranovich 2003; Ball 2008; 2010) – will not be discussed.

In going through the phenomena to be discussed, it will become clear that HPSG can flexibly handle a wide range of languages even while keeping its core characteristics. In fact, in most areas of analytic interest, several different approaches within the framework are equally viable at the outset. Relatedly, the analysis of many areas, especially from a cross-linguistic perspective, is far from settled. This seems to me to be an advantage: it allows competing analyses to be modeled clearly and precisely, while allowing empirical facts to better adjudicate between approaches.<sup>4</sup>

In my discussion, I will move through the three areas of argument indexing (in Section 2), non-accusative alignment (Section 3), and VSO constituent order (Section 4), which corresponds to decreasing pervasiveness – roughly estimated – for each phenomenon across the world's languages.

<sup>4</sup>This point is also made in Fokkens (2014), especially in Chapter 1.

Douglas L. Ball

# **2 Argument indexing**

Widespread among all sorts of natural languages – understudied or not – is what Haspelmath (2013) terms argument indexing. In argument indexing, morphologically dependent elements – that is, affixal elements usually located within or near the verb and with denotations (seemingly) similar to pronouns – either occur in place of arguments of the main semantic predicate of the clause or alongside them.<sup>5</sup> While this phenomenon occurs even a bit in English and throughout other European languages, argument indexing in understudied languages tends to be more "rampant": that is, all (or most) of the verb's arguments are indexed, rather than just the subject being indexed, as is the most common pattern in Europe (Siewierska 2013b). When the argument indexing is "rampant": its treatment within the syntax (and within the morphology-syntax interface) of a language becomes a key question. HPSG analyses offer several possible answers how the syntax of argument indexing works, all while maintaining the framework's surface orientation. Empirically, it is clear that not all argument indexes behave in quite the same way in all languages, so I will explore the analysis of two subtypes of indexes in the sections to follow: first, indexes that do not co-occur with external, role-sharing noun phrases, and, second, indexes that can co-occur with external, role-sharing noun phrases.<sup>6</sup>

# **2.1 Indexing in complementary distribution with conominals**

Some argument indexes in some languages have the property that they do not – and cannot – appear with a non-pronominal element sharing their same syntactic/semantic role in the same narrow clause (I will refer to these non-pronominal elements as *conominals*, following Haspelmath 2013: 205). The term that Haspelmath suggests – and I will use – for such argument indexes is pro-index. One language showing pro-index behavior with its argument indexes is Macushi, as revealed from the examples in (1):

<sup>5</sup>Thus, this area includes what has been considered to be predicate-argument agreement as well as what some consider to be "pronoun incorporation", though one of the key points of Haspelmath (2013) is that the pre-existing terminology – if not also the pre-existing analyses – in this domain has been misleading.

<sup>6</sup>See also Saleem (2010) for a similar – though not identical – analysis of the same analytical domain.

5 HPSG in understudied languages

	- a. i-koneka-'pî-u-ya 3SG.ABS-make-PST-1SG-ERG 'I made it.'
	- b. \* uurî-ya 1SG-ERG i-koneka-'pî-u-ya 3SG.ABS-make-PST-1SG-ERG Intended: 'I made it.'

The example in (1a) is just a verb with all its arguments realized as argument indexes. The example in (1b) clearly reveals the pro-index behavior of the argument indexes: the affixed verb is incompatible with an independent pronoun, such as *uurîya* '1SG.ERG'.

The pro-index phenomenon has a straightforward (and, as a result, commonly assumed) analysis within HPSG. The analysis was originally proposed by Miller & Sag (1997) for French "clitics", but could equally be applied to the Macushi case above, among others. Key to this analysis is the idea found in most versions of HPSG (emerging in the mid-to-late 1990s) that there are separate kinds of lists for the combinatorial potential(s) of heads. In fact, not only are there these separate lists, but there can be (principled) mismatches between them (see Abeillé & Borsley 2021: Section 4.1, Chapter1 of this volume and Davis, Koenig & Wechsler 2021, Chapter 9 of this volume). The first of these lists is the ARGUMENT-STRUCTURE (ARG-ST) list. This list handles phenomena related to the syntax-semantic interface, like linking (Davis 2001), case assignment (Meurers 1999; Przepiórkowski 1999; Przepiórkowski 2021, Chapter 7 of this volume), and binding restrictions (Manning & Sag 1998; Wechsler & Arka 1998; Müller 2021a, Chapter 20 of this volume). The other lists are the two valence lists, the SUBJECT (SUBJ) list and the COMPLEMENTS (COMPS) list. These are concerned with the "pure" syntax and mediate which syntactic elements can combine with which others.

On the Miller & Sag-style analysis of pro-indexes, the verb's ARG-ST list contains feature descriptions corresponding to all of its arguments. For the examples in (1), the verb's ARG-ST list would include a feature description for both the semantic maker and the semantic element that is made (as will appear in (4)). However, the same verb's SUBJ and COMPS lists would contain no elements corresponding to any affixed arguments. What prompts this disparity? The arguments realized by affixes correspond to a special kind of feature description on

#### Douglas L. Ball

the ARG-ST list, typed *non-canonical*. 7 (Intuitively, these arguments are realized in a non-canonical way.) Feature descriptions of the *non-canonical* type differ from their sibling type *canonical* in how they interact with the SUBJ and COMPS lists. Governing the relationship between the ARG-ST and these valence lists is the Argument Realization Principle, which is stated in (2):<sup>8</sup>

(2) Argument Realization Principle adapted from Ginzburg & Sag (2000: 171): *word* ⇒ SS|LOC|CAT SUBJ 1 COMPS 2 *list*(*non-canonical*) ARG-ST 1 ⊕ 2 

The constraint says that the ARG-ST list is split in two parts: 1 and 2 . The first list is identified with the SUBJ list. The list can be empty or it can contain one or more elements. Usually the length of the SUBJ list is limited to one element.<sup>9</sup> A list of *non-canonical* elements is subtracted from 2 . The result of this difference is the value of COMPS. This formulation of the Argument Realization Principle allows non-canonical elements like clitics and gaps in the SUBJ and COMPS list. As Ginzburg & Sag (2000: 171) point out, this is not a problem since overt signs that are combined with heads by the Head-Complement Schema or the Head-Subject Schema have a SYNSEM value of type *canonical* and hence could never be combined with heads having elements of type *non-canonical* in their valence lists. Ginzburg & Sag (2000: 40) assume the Principle of Canonicality, which is given in (3):

<sup>7</sup> In the version of HPSG of Ginzburg & Sag (2000), *non-canonical* was an immediate subtype of *synsem*; and the relevant feature descriptions were thus seen as syntactico-semantic complexes. In the latter-day version of HPSG known as Sign-Based Construction Grammar (Sag 2012), the *non-canonical* type was re-christened *covert* and was an immediate subtype of *sign*, the relevant feature descriptions being entire signs. In spite of these differences, which may seem significant, the analysis is very similar in both versions of the framework. Several subtypes of *non-canonical*/*covert* have been recognized, the subtype relevant for this example would be *aff*. But I will just use the *non-canonical* type here. For a general comparison of the version of HPSG used here and throughout the volume with SBCG see Müller (2021d: Section 1.3.2), Chapter 32 of this volume.

<sup>8</sup>The append operator ⊕ allows two lists to be combined, preserving the pre-existing order of the elements on the new list. Thus, h *a, b* i ⊕ h *c, d* i will yield h *a, b, c, d* i. Ginzburg & Sag (2000: 170) define as follows: "Here ' ' designates a relation of contained list difference. If <sup>2</sup> is an ordering of a set <sup>2</sup> and <sup>1</sup> is a subordering of 2, then <sup>2</sup> <sup>1</sup> designates the list that results from removing all members of <sup>1</sup> from 2; if <sup>1</sup> is not a sublist of 2, then the contained list difference is not defined. For present purposes, is interdefinable with the sequence union operator ( ) of Reape (1994) and Kathol (1995): ( = ) ⇔ ( = )." , which is called *shuffle*, is also explained in Müller (2021b: 391), Chapter 10 of this volume.

<sup>9</sup>But see Müller & Ørsnes (2013) for an analysis of pronoun shift in Danish assuming multiple subjects.

5 HPSG in understudied languages

(3) Principle of Canonicality (adapted from Ginzburg & Sag 2000: 40): *sign* ⇒ - SYNSEM *canonical*

The principle ensures that full signs always have SYNSEM values of type *canonical*. (2) and (3) together make sure that only the *canonical* feature descriptions on the ARG-ST can (and must) appear on the SUBJ and COMPS lists of heads that are used in further combinations.<sup>10</sup> This, then, captures the idea that affixal arguments, which are of type *non-canonical*, are generally inert in the combinatorics of the syntax proper: they saturate an argument slot and that argument slot is no longer available for other (at least direct) syntactic combination.

So, returning to the Macushi word *ikoneka'pîuya* 'I made it.' from (1a), the relevant partial lexical description is given in (4):

(4) Lexical item for *ikoneka'pîuya* 'I made it.':

In (4), the word's ARG-ST list is comprised of two *non-canonical* feature descriptions (corresponding to the maker argument with index 1 and the made argument, with index 2 ). Yet, by the Argument Realization Principle, the SUBJ and COMPS lists are empty. Consequently, *ikoneka'pîuya* 'I made it.', can be a clause by itself – as it is in (1a) – because it requires no other valents (that is, it has a "saturation" level on a par with a clause) and it is headed by a verb (just like a clause is).

The specification of an empty list also contributes to ruling out examples like (1b) with a conominal. On the standard HPSG view of how valence is managed, no element with an empty valence list can combine with any possible valence-

<sup>10</sup>Ginzburg & Sag (2000: Section 5.1.3) and Abeillé & Godard (2007: 50) make use of the fact that gaps are admitted in the SUBJ list to account for *that* trace effects in English and the *qui*/*que* distinction in French relative clauses. However, these gaps are just used to distinguish sentences with subject gaps from sentences without subject gaps. The verbs with gapped subjects never combine with them via the Head-Subject Schema.

Douglas L. Ball

saturating syntactic entity, like an NP.<sup>11</sup> Thus, the grammar would correctly not license a tree like in Figure 1. 12

Figure 1: A tree of an illicit conominal–pro-index combination

Overall, the Argument Realization Principle-mandated non-mapping of the *non-canonical* ARG-ST list members to either the SUBJ or the COMPS list captures the key behavior found in the pro-indexing type of argument indexing: the argument indexes occur in complementary distribution with any conominal.

# **2.2 Indexing co-occurring with conominals**

Even though the pro-index type of argument index has a more straightforward analysis, this type is not the most common in the world's languages. Rather, the most common type of argument indexing appears to be the one where the argument indexing affix(es) can co-occur with a conominal, but do(es) not have to. In Haspelmath's (2013) terms, this is the cross-index type.<sup>13</sup> A language exhibiting this type of behavior is Basque, as evident from the example in (5):

<sup>11</sup>To truly rule out the NP from combining with the verb in Figure 1, the NP would also need to not match any nonlocal requirements of the verb, since otherwise the combination in Figure 1 could be an instance of the Filler Head Schema. See Borsley & Crysmann (2021), Chapter 13 of this volume for an overview of analyses of nonlocal dependencies in HPSG.

<sup>12</sup>The tree in Figure 1, as well as other trees in this chapter, only provides the relevant attributevalue pairs, suppressing the geometry of features found in more articulated feature descriptions.

<sup>13</sup>The behavior of cross-indexes is canonical for so-called "pro-drop" languages, a term arising from the transformational syntax tradition (particularly from Chomsky 1981: 28, Section 4.3), but now with wider currency.

5 HPSG in understudied languages

(5) Basque [eus] (Laka 1996: 98) Zuk 2SG.ERG niri 1SG.DAT liburua book.DEF saldu sold d-i-da-zu. 3SG.ABS-AUX-1SG.DAT-2SG.ERG 'You have sold me the book.'

Though *zuk* 'you', *niri* 'me', or even *liburua* 'the book' would not need to be present for the grammaticality of this sentence, this sentence (and language) exhibits cross-index behavior because, even though these conominals are present, the argument indexing affixes on, in this case, the auxiliary *didazu* '3SG.ABS:AUX:- 1SG.DAT:2SG.ERG' still occur.

Unlike the pro-indexes, there is no current standard HPSG analysis of crossindexes.<sup>14</sup> Nevertheless, there are some possible approaches. I detail two in some depth here – what I will call the underspecification analysis and what I refer to as the direct syntax approach – and mention some other options near the end of the section.

### **2.2.1 Underspecification for cross-indexes**

On the underspecification analysis, the lexical descriptions of argument indexcontaining words would have underspecified feature descriptions on their ARG-ST lists, corresponding to their argument indexes. These would then resolve depending on the syntactic context. Which portions of the feature description would be underspecified is a bit flexible (at least, in the abstract) and depends on whether the analyst thought the "agreement" (argument indexing) was more formal or semantic in nature (see Wechsler 2021, Chapter 6 of this volume for a more thorough discussion of what is involved here). For the sake of illustration purposes, I will employ a more semantic approach below.

Let us consider a word, like the Basque auxiliary *dut* 'AUX:3.ABS:1SG.ERG', that has a third-person singular absolutive argument index. Such a word might just be specified, by the constraints on the various lexical types of Basque, as in (6):<sup>15</sup>

<sup>14</sup>This may, in part, be a consequence of the standard way of managing predicate-argument relations in the syntax in HPSG: this management strategy is resource-sensitive – namely, once something is "cancelled" off a SUBJ or COMPS list, it no longer appears on any subsequent (higher) lists and cannot be used for other syntactic purposes. However, Section 2.2.3 will discuss some HPSG approaches where the management strategy is not so resource-sensitive.

<sup>15</sup>The lexical descriptions associated with *dut* in (6)–(8) all have a further argument – the verbal expression associated with the auxiliary – suppressed in these descriptions (with ellipses) because such a verbal argument (and its interaction with the other arguments) is not the focus of the analysis here.

#### Douglas L. Ball

$$\text{(6)} \quad \left[ \text{ss} \middle| \text{LOC} \middle| \text{CAT} \middle| \text{ARG-ST} \left\langle \begin{bmatrix} \text{symsem} \\ \text{LOC} \middle| \text{CONT} \middle| \text{IND 3sg} \end{bmatrix}, \dots \right\rangle \right].$$

To be consistent with (6), the second ARG-ST list member just needs to be something that is semantically a third person. Therefore, the second ARG-ST list member could ultimately resolve to a *non-canonical* feature description, as in (7):

$$\text{(7)} \quad \left[ \text{ss} \middle| \text{LOC} \middle| \text{CAT} \middle| \text{ARG-ST} \left\langle \begin{bmatrix} \text{non-canonical} \\ \text{LOC} \middle| \text{CNT} \middle| \text{IND} \end{bmatrix}, \dots \right\rangle \right]$$

This resolution would be forced when no conominal is present (if this *synsem* on the ARG-ST list resolved to the *canonical* type and a conominal was not present, the COMPS list would illicitly not be emptied). The analysis would, in this condition, be identical to that of the pro-indexes provided in Section 2.1.

However, the second ARG-ST list member could also ultimately resolve to a *canonical* feature description, as in (8):

$$\text{(8)} \quad \left[ \text{ss} \middle| \text{LOC} \middle| \text{CAT} \middle| \text{ARG-ST} \left\langle \begin{bmatrix} \text{symsem} \\ \text{LOC} \middle| \text{CNT} \middle| \text{IND} \end{bmatrix}, \ldots \right\rangle \right]$$

This resolution would be forced when a conominal is present (otherwise, the conominal could not be syntactically licensed). Thus, the analysis, in this condition, is like an instance of obligatorily co-present conominal and argument index (a gramm-index in Haspelmath's terms).

As the discussion above indicates, there is a certain portion of this analysis that is not lexically mandated: the precise resolution of the argument depends on the specific syntactic expressions appearing in a particular clause. This analysis is also of the dual-nature type discussed by Haspelmath (2013): the argument index is treated as a pro-index when it has no conominal and it is treated as grammindex when a conominal is present. Other frameworks employ a similar analysis (LFG does, for instance – see Bresnan et al. 2016: Chapter 8). Haspelmath criticizes this approach for positing two distinct structural types for a single kind of affix; though, in the analysis above, it does not seem that the structural types are that radically different (observe that just one underspecified lexical description is associated with a given affixed form). Still, we might want to at least consider other options – and, in keeping with the tendency for multiple different approaches to be found within HPSG, there are some.

### **2.2.2 The "direct syntax" approach to cross-indexes**

Another approach to cross-indexes, proposed for Oneida in Koenig & Michelson (2015), takes the view that, in at least some languages, argument indexes always

#### 5 HPSG in understudied languages

stand for arguments and the combination of a conominal and a verb with argument indexing is more purely semantically mediated and akin to a nominal expression combining with an already saturated narrow clause.<sup>16</sup> This approach Koenig & Michelson have called the "direct syntax approach" (it is direct in the sense that the combinatorics are not mediated by any valence lists, which, arguably, is a bit more "indirect").

As Koenig & Michelson (2015) discuss in detail, it appears that Oneida exhibits some interesting properties that make treating its argument indexing patterns in a (seemingly) rather different way much more plausible. For one, as shown in (9), the verb indexes all its arguments morphologically (except inanimates, like 'his axe' in (10)) – often with portmanteau affixes, as in (9) – making the case that the argument indexes are the actual arguments much stronger.

(9) Oneida [one] (Koenig & Michelson 2015: 5) wa-hiy-até·kw-a-ht-eʔ FACT-1SG*>*3.M.SG-flee-LNK.V-CAUS-PNC 'I chased him away.'

Second, the evidence is equivocal about whether the language has any selection that cannot be treated as semantic selection.

Thus, on Koenig & Michelson's view (and in keeping with the terminology of the previous discussion): all the arguments correspond (at best) to *non-canonical* elements on the ARG-ST list and thus there is never any head-argument combinations in the syntax. Any and all conominals then are licensed via index sharing of a nominal and an element on a NONLOCAL feature that Koenig & Michelson call DISLOC (see Koenig & Michelson 2015: 39 for discussion of why they consider this the best way to deal with the NONLOCAL feature), as shown in Figure 2, a tree of (10):

(10) Oneida [one] (Koenig & Michelson 2015: 17) ʌ-ha-hyoʔthi·yát-eʔ FUT-3M.SG.A-sharpen-PNC laoto·kʌ́·, his.axe 'He will sharpen his axe,'

Koenig & Michelson's (2015) discussion suggests that the direct syntax type might represent an extreme, occurring only in the most polysynthetic and nonconfigurational of languages, like Oneida and its Iroquoian kin. However, this

<sup>16</sup>This analysis is perhaps the closest any HPSG analysis comes to the so-called Pronominal Argument Hypothesis (Jelinek 1984).

Douglas L. Ball

Figure 2: Licensing conominals on the direct syntax approach

claim remains an open question. Perhaps further study will reveal that this sort of analysis could profitably be employed in other kinds of languages.

### **2.2.3 Other possibilities for cross-indexes**

In addition to the analyses discussed in the previous two subsubsections, there are a few more conceptual avenues that might be explored for the analysis of cross-indexes, though it is not clear that they have been fully explored in the literature yet (which might raise some questions as to their viability).

One route to explore would be to use lexical rules to create ARG-ST or the valence lists with feature descriptions corresponding to both the argument indexes *and* the conominals (something similar was explored for "clitics" in various Romance varieties by Monachesi 2005). Such a lexical rule might look as in (11):

$$\begin{aligned} \text{(11)} \quad & \left[ \text{SYNSEM}[\text{LOC}[\text{CAT}[\text{ARG-ST}] \boxplus \text{[} \left\{ \begin{subarray}{c} \text{non-canonical} \\ \text{Loc}[\text{CONT}[\text{IND}] \boxplus \text{]} \end{subarray} \right\} \right] \mapsto \\ & \left[ \text{SYNSEM}[\text{LOC}[\text{CAT} \boxplus \text{[} \text{COMP-ST}[\text{IND}] \boxplus \text{]} ] \right] \Bigg[ \\ & \left[ \text{ARG-ST}[\boxplus \text{]} \end{aligned} \right] \end{aligned} $$

This approach might be a way to loosen the resource sensitivity of the usual valence regime, though a proposal along these lines would need to take care in considering whether any changes would be needed in the statement of the Argument Realization Principle (and if so, what form they should take) and if there would be any undesirable consequences to allowing single semantic arguments to correspond to more than one syntactico-semantic element.

Another route to consider would be to relax the resource sensitivity in the syntax, instead of in the information associated with single words. Given pro-

#### 5 HPSG in understudied languages

posals like the one in Bender (2008) for non-cancellation of arguments (more detailed discussion of this proposal is in Müller (2021b: Section 7), Chapter 10 of this volume<sup>17</sup> to deal with apparent cases of discontinuous constituency, maybe something similar should also be explored for the cross-index type of argument indexing.

Overall, there seems to be a need for more explorations into cross-index behavior cross-linguistically within HPSG. Certainly, the above discussion shows that, in fact, there is no shortage of possible analyses, but work remains to further determine which of these might be the best analysis, overall, or which of these might be best for which languages.

# **3 Non-accusative alignments**

Another area (in fact, not so distant from argument indexing in function) where understudied languages have enriched the general understanding of natural language morphosyntax is in the area of (morphosyntactic) alignment. Alignment concerns how the morphology of a language (if not also its syntax) groups together (or "aligns") different arguments into (what seem to be) particular grammatical relations (see Bickel & Nichols 2009 for an overview of alignment).<sup>18</sup> The most widespread alignment is the accusative one – familiar from ancient Indo-European languages and conservative modern day ones – where subjects of transitive and intransitive verbs are treated differently from the objects of transitive verbs. Other recognized alignments include ergative (where subjects of transitive verbs are treated differently from the subjects of intransitives, which, in turn, pattern with the direct objects of transitives) (see Comrie 1978; Plank 1979; Dixon 1979; 1994, among others, for further discussion), split-S/active (where semantic agents and patients are treated differently) (see Klimov 1973; 1974; Dixon 1994: Chapter 4; Mithun 1991; Wichmann & Donohue 2008, among others, for further discussion), tripartite (where subjects of transitive verbs, subjects of intransitive verbs, and objects of transitive verbs are each treated differently) (Dixon 1994: 39–40), Austronesian alignment<sup>19</sup> (where arguments of various semantic roles can flexibly hold a privileged syntactic slot) (see Schachter 1976; Ross 2001; Himmelmann 2005 for more discussion), and hierarchical alignment (where elements

<sup>17</sup>Also see like-minded proposals in Meurers (1999) and Müller (2008).

<sup>18</sup>Alignment can be explored both in head-marking and dependent-marking (Nichols 1986); however, having already focused on a kind of head-marking strategy in the previous section, I will focus on the corresponding dependent-marking strategy in this section.

<sup>19</sup>This kind of system is known by various different names other than Austronesian alignment, including a symmetrical voice system, a Philippine-type voice system, or an Austronesian focus system.

#### Douglas L. Ball

of higher discourse salience are treated differently from elements of lower discourse salience) (see Jacques & Antonov 2014 for a good overview of what is involved).

Surveys from WALS (Comrie 2013a,b; Siewierska 2013a) indicate that accusative alignment is common worldwide,<sup>20</sup> and this seems to be even more true of languages with large numbers of speakers. Of the top 25 most widely spoken languages at present, arguably only a collection of languages from the Indian subcontinent (Hindi-Urdu, Marathi, and Gujarati) have non-accusative alignments, and even those are restricted to certain portions of their respective verbal systems (see Verbeke 2013: Chapter 7 for more on the patterns in these and other Indo-Aryan languages). Impressionistically, it seems that understudied languages do have a much stronger propensity for non-accusative alignments.

Because the non-accusative alignments at least seem to be rather different than accusative alignment, it is an interesting question how a given framework might handle these kinds of systems. In the majority of this section I will focus on the analysis of ergative systems, as a proof of concept (see, however, Drellishak 2009 for analyses of each of the non-accusative alignments, including the hierarchical type).<sup>21</sup>

In dealing with the analysis of ergative systems, it will be useful to divide the discussion into two parts. First, I will consider how particular morphological forms within NPs are licensed in instances when they co-occur with their governing verb – I will call this "the licensing of case in the syntax" (see also Przepiórkowski 2021, Chapter 7 of this volume). Second, I will consider how particular arguments come to be associated with particular morphological forms (whether realized or not) – I will call this "the licensing of case in linking" (see also Davis, Koenig & Wechsler 2021, Chapter 9 of this volume). This division is not commonly recognized in most other frameworks; however, it does present itself as a possible division within HPSG, due to the separate ARG-ST and valence lists.

<sup>20</sup>However, since the surveys focus more on coding patterns rather than behavioral patterns (coding and behavioral in the sense of Keenan (1976) – "coding" related to morphological patterns or function words that signal a particular grammatical relation category; "behavioral" related to reference properties or patterning across clauses), it is possible that they underreport behavioral accusative patterns, even among languages that have so-called "neutral" coding patterns.

<sup>21</sup>There is also some discussion of an HPSG analysis of the ergative-aligned case system of the Caucasian language Archi – similar, in some respects, to the Lezgian examples I consider further on – in Borsley (2016), though the focus of that paper is much more on Archi's argument indexing system rather than its case system.

#### 5 HPSG in understudied languages

# **3.1 The licensing of case in the syntax**

The licensing of case in the syntax within HPSG is as straightforward in nonaccusative alignments as it is in an accusative alignment system; the fundamentals are the same, regardless of alignment. This comes about due to the use of feature value matching (also known as "feature unification") for case licensing in the syntax.<sup>22</sup> The simple premise of feature value matching is that a value for a particular feature possessed by an argument and a feature value required by its head (for that same argument) must match. The nature of this analysis makes case licensing nearly identical – excepting the different values involved – to the selection of part-of-speech categories.<sup>23</sup>

To actually license case in the syntax with an ergative system, the key elements are (1) a feature for nominal expressions (call it CASE) and (2) appropriate values for CASE, like *ergative* and *absolutive*. Note that *ergative* and *absolutive* are types, so they can be potentially grouped with other case values into supertypes, like *structural* cases or *semantic* cases, if such groupings are relevant (as was done for the first time in Heinz & Matiasek 1994: 207). With those features in place, the rest of the analysis falls out through the larger theories of syntactic selection, featural identities, and syntactic combination: certain heads will require [CASE *erg(ative)*] and [CASE *abs(olutive)*] of their arguments. If certain potential arguments are just single words, the values for CASE of these words will straightforwardly match or not. If certain potential arguments consist of multiple words, independent constraints on HEAD value identity will ensure that the value for CASE will be identical between the head daughter and overall phrase (Abeillé & Borsley 2021: 22, Chapter 1 of this volume); constraints on the syntactic combination then ensure that the CASE values of the nominal expressions and the head requirements match.

To see this with an actual example, let us consider the Lezgian sentence in (12):

<sup>22</sup>Feature value matching does have some conceptual similarity to the "feature checking" approach to case found in more recent Minimalist work (Chomsky 1991; 1993; Adger 2000; 2010; Frampton & Gutmann 2006; Pesetsky & Torrego 2007), though there are notable differences between the approaches, particularly that features are deleted in feature checking, but not in feature value matching. Borsley & Müller (2021: Section 3.5), Chapter 28 of this volume discuss problems that arise for feature checking approaches if values are needed more than once, e.g., in free relative clauses.

<sup>23</sup>Thus, to use terms more commonly associated with Mainstream Generative Grammar (i.e. work in Transformational Grammar, e.g., work in Government & Binding and Minimalism Chomsky 1981; 1995), HPSG views case licensing as (a specific kind of) c-selection (in the sense of Grimshaw 1979).

Douglas L. Ball

(12) Lezgian [lez] (Haspelmath 1993: 287) Aburu 3PL.ERG zun 1SG.ABS ajibda. shame.FUT 'They will shame me.'

The example in (12) could be analyzed with the tree in Figure 3.

Figure 3: Analysis of the Lezgian example *Aburu zun ajibda.* 'They will shame me.'

The tree in Figure 3 consists of two head-argument combinations, and in fact, the tree has the same geometry as an accusative verb-final language (on standard assumptions about the constituency) – indeed, as the HPSG analysis does not intrinsically tie the analysis of case with constituency, the geometry of clauses would, all else being equal, not differ based on alignment alone. The most idiosyncratic aspect of Figure 3 is that the verb *ajibda* 'will shame' is one that requires an ergative–absolutive combination of arguments. Because the HPSG

#### 5 HPSG in understudied languages

framework is feature-rich and formally rigorous in how feature values must be constrained within constituent structures, the licensing of case in the syntax in an HPSG analysis is very straightforward.

# **3.2 The licensing of case in linking**

Lurking behind the most idiosyncratic aspect of Figure 3 is the question of how particular heads come to have their particular argument requirements. This is, in fact, the question of how case is licensed in linking. As with other matters of nonaccusative alignments, it seems that different alignments need not be treated in a wholly different fashion from each other: thus, the same kinds of analytic moves for accusatively aligned systems could be used for non-accusatively aligned systems. That being so, it is probably too hasty to assume that there is a one-sizefits-all solution for linking of case across all languages (regardless of alignment), as quite a few different factors appear to be important in different languages, among them at least verb class (that is, the classes related to the verbal lexical semantics), the semantic nature of the argument itself, the morphological form of the verb, and the subordination status of the clause headed by the verb (see, for example, discussion in Dixon 1994).

In all known ergative languages, the ergative–absolutive case pattern – clearly indicating that the subjects of transitive verbs are not encoded like the subjects of intransitive verbs – appears with "primary transitive verbs" (a term from Andrews 1985; 2007): predicates with the canonical meaning associated with transitive verbs where an initiating entity causes change in an undergoing entity. Given this basic generalization, a possible analysis of the arguments' case requirements with these "primary transitive verbs" would be through the constraint in (13):

(13) *trans-v-lxm* ⇒


In (13), the *transitive-verb-lexeme* (*trans-v-lxm*) has an ARG-ST list with both an ergative and an absolutive argument. Key to this result is that the verb lexeme is associated with an *actor-undergoer-relation* (*act-und-rel*), and, in fact, this is the value of the KEY feature in (13), encoding the designated semantic relation relevant for case and linking (see Koenig & Davis 2006 for more on the KEY feature). The *act-und-rel* type designates semantic predicates with precisely the

#### Douglas L. Ball

denotation behind the notion of "primary transitive verb" (see Davis 2001: 75– 134 for discussion of this type, other related types, and how these types fit into a hierarchy of semantic relations). Provided that the constraint in (13) is the only argument realization constraint to mention the ergative and absolutive cases, the ergative–absolutive collection of arguments would only be available with verbs with this particular sort of meaning.<sup>24</sup> The overall linking constraint is placed on a *trans-v-lxm*, so that (a) the case requirements are stated once for all the different inflected forms of the verb and (b) so these case requirements could also be inherited by other semantically appropriate verbs with even more arguments than the two arguments that were mentioned explicitly in (13).

As alluded to above, in other case "assignment" situations, though, other factors beyond just the semantics of the verb can be relevant (certainly with any instance of "split ergativity", among others). These could be still be treated with constraints with a similar format to (13), but if they needed to refer to, say, just past tense verb forms, the relevant linking constraint would almost assuredly need to reference information from the morphology (perhaps encoded as part of a MORPH(OLOGY) attribute). Given the known claims about what non-accusative alignment can be sensitive to, it seems likely that the sign-based architecture (where all linguistic areas of structure can interact in parallel) would enable the straightforward statement of case constraints based on the previously claimed generalization. And, in fact, having the possibilities of morphological form, semantics, and various syntactic properties easily available for an analysis could be useful as a means of testing and modeling which areas might be relevant in particular examples.

Overall, there is a lot still to be done to better understand the intricate details of case and linking generally, but given the toolbox available in the HPSG framework (again, see Przepiórkowski 2021, Chapter 7 of this volume and Davis, Koenig & Wechsler 2021, Chapter 9 of this volume), it seems like HPSG offers a lot of flexibility for better figuring out what linguistic elements are crucial for particular patterns and for encoding analyses that directly reference the interaction of these elements across different levels of structure.

<sup>24</sup>Though to achieve more generality with the licensing of absolutive case, one might follow Ball (2008: Chapter 7) and have separate linking constraints for absolutive and ergative case.

5 HPSG in understudied languages

# **4 Verb–subject–object constituent order**

Let us turn to another interesting phenomenon of understudied languages: Verb-Subject-Object (VSO) constituent order.<sup>25</sup> VSO order appears to be the rarest of the more common orders. Various typology surveys (like Dryer 2013) indicate that it is only found in about 8–10% of the world's languages. Interestingly, a greater number of examples of languages with this order do come from the realm of understudied languages. Of the twelve understudied languages (a non-random sample26) mentioned in the introduction, five of them have VSO (or verb-initial with no strict ordering of S and O) order. Perhaps a bit more telling of the (apparent) understudied language bias to VSO order is that only one of the top 50 languages by native speakers – Tagalog – reasonably clearly has VSO order.<sup>27</sup> Interestingly, VSO order does occur in a number of languages as a non-dominant word order: for instance, it is found in a great many western European languages (English, German, French, Spanish, among others) as a common order in questions.

In spite of its relative rarity as a dominant order, VSO order (as well as verbinitial order with flexible ordering of verbal dependents) poses some interesting challenges for frameworks that place some importance in constituency (as HPSG has done). Since the V and the O are not (normally) adjacent in VSO clauses, it is less than obvious that there is a constituent that groups them together (as a VP or a V<sup>0</sup> ) in these languages. This contrasts with the more common Subject– Verb–Object and Subject–Object–Verb orders where a constituent that groups the V and O together is much more plausible, on surface adjacencies alone. A long-standing question across constituency-based frameworks is how to best characterize VSO order, both on its own and in the context of the other crosslinguistically attested and common order patterns.

Analyses within Mainstream Generative Grammar have generally analyzed VSO as a derived order; all (or nearly all) of them (especially after the 1970s)

<sup>25</sup>It would probably be clearer to refer to this order as Predicate–Agentive–Patientive order, because, as noted in the previous section, alignment and constituent order are, to a degree, disjoint. However, I will bow to tradition and use the terms verb, subject, and object and their abbreviations, V, S, and O.

<sup>26</sup>See footnote 3 for the rationale behind the choice of those twelve languages.

<sup>27</sup>Tagalog's distant Austronesian relatives Indonesian, Javanese, and Sundanese, along with Arabic – all four of these languages are also in the top 50 – had VSO, historically, and each of these languages preserves some instances of VSO order. As is often the case when looking at word orders and languages, things are rarely as cut and dry as they might otherwise seem.

#### Douglas L. Ball

have viewed VSO as a derived permutation of some constituent (or more) from a covert SVO order (see Clemens & Polinsky 2017 for an overview of the analyses within this tradition). Some of the suggested HPSG analyses follow a similar line of analysis, and I will briefly touch on those proposals below. However, more HPSG analysts have generally taken VSO order as is, and so I spend more of this section discussing two surface-oriented VSO analyses in HPSG: what I call the flat structure analysis and what I call the binary branching head-initial analysis.

# **4.1 The analogues of verb movement in HPSG**

Interestingly, there is not one, but two styles of HPSG analyses that are roughly analogous to Mainstream Generative Grammar's verb movement, commonly employed to derive VSO order. Both are discussed in greater detail in Müller (2021b), Chapter 10 of this volume, so my comments here will be somewhat superficial and will center around verb-initiality. The first of the two uses the DOUBLESLASH (DSL) feature (see Müller 2021b: Section 5.1, Chapter 10 of this volume) and so treats the initial verb as involved in a special dependency that uses a mechanism for percolation of information that is similar to the slash passing in nonlocal dependencies: information related to the initial verb is passed through the constituent structure to the verb's downstairs position (a trace, with semantic and syntactic structure, though no phonological realization). While this analysis has been explored for Germanic languages (see Figure 5 in Müller 2021b, Chapter 10 of this volume for a pictorial depiction of an analysis of an English verb-initial clause and Müller (2021c: Chapter 6) for an application to all Germanic V2 languages), I am not aware that it has (yet) been seriously explored in the HPSG literature for any particular verb-initial language (let alone for an understudied verb-initial language).

The other verb-movement-like analysis uses constituent order domains and linearization (see Müller 2021b: Section 6, Chapter 10 of this volume). On this analysis, the verb, while combined with its complements at a low level, is constrained at the clausal level to be initial (see Borsley 2006 for more discussion of this in a verb-initial context). This style of analysis has been closely and carefully considered for Welsh in work by Borsley (for example in Borsley 1989; 1995; 2009), but time and again, it seems that Borsley suggests that an analysis (at least for the basics of clausal structure) more in line with what is discussed in Section 4.2 is to be preferred for Welsh.

Given the rarity (and perhaps reluctance) – noted above – of HPSG researchers to analyze VSO order as covertly SVO (or, more to the point, to recognize a constituent that groups together the V and O within VSO structures), one might

#### 5 HPSG in understudied languages

wonder why this has (hitherto) been so. Probably, HPSG's surface orientation has played a role, as well as the fact that HPSG-internal considerations do not force or strongly suggest positing a VP constituent. Furthermore, HPSG analysts have also carefully considered how constituency tests might inform such structures. In exploring these, various HPSG researchers (such as Borsley 2006 for Welsh and Ball 2008: Chapter 3 for Tongan) have not found compelling evidence for positing a VP constituent in particular VSO languages.<sup>28</sup> For instance, Ball, in looking at Tongan, found that: putative VP-coordination "over" a subject is not possible; no auxiliary or verb obviously subcategorizes for a verbal constituent that obviously excludes its subject, nor do adverbial elements obviously select for such a constituent; and, while "VP-fronting" and "VP-ellipsis" are possible, they seem to involve NPs rather than VPs. While these facts do not definitively rule out a VP (it is difficult to argue that anything is clearly absent), they suggest that not positing a VP does not complicate the grammar of this kind of language. Undoubtedly, it would be interesting to see what further explorations like these with more verb-initial languages might reveal. Still, the VSO-as-covert-SVO analysis may lie on shakier grounds empirically than analyses within Mainstream Generative Grammar have generally acknowledged and this, explicitly or implicitly, has led HPSG analysts to explore other avenues in the analysis of VSO languages.

# **4.2 The flat structure analysis**

The seemingly most common analysis of VSO languages in HPSG is the flat structure analysis.<sup>29</sup> As its name suggests, the proposed structure is flat, with the verb, subject NP, and any complement NPs all being sisters within the same constituent. To license such a structure, one has to depart from rules that put just heads and complements or just heads and subjects together. The flat struc-

<sup>28</sup>The undermotivated VP in Welsh is probably just a VP headed by a finite verb, as Welsh does give evidence for non-finite VPs (Borsley et al. 2007). In other languages, like Tongan, the undermotivated VPs might include both finite and non-finite VPs. As has emerged in the study of verb-initial languages in several frameworks, these languages might not be as structurally uniform as the term "verb-initial languages" suggests.

<sup>29</sup>There are, in fact, several alternative flat structure analyses, differing slightly in how the head's valence features relate to the structure. Besides having the head combine with its subjects and complements simultaneously, as in the main text, one variant has all the arguments as complements and, thus, VSO order arises out of a head-complements structure. Borsley (1995) suggests that different languages might utilize different variants: in particular, Borsley suggests that Syrian Arabic uses the head-subject-complements combination while Welsh uses the head-complements combination. Still other analysts (such as Ball 2008; 2017) assume just one valence feature VAL instead of SUBJ and COMPS and are similar to the subjects as complements approach in combining all arguments with the head at once.

Douglas L. Ball

ture analysis instead makes use of what I call the Head-All-Valents Schema (also sometimes called the Head-Subject-Complements Schema), given in (14):

(14) Head-All-Valents Schema:

*head-all-valents-phrase* ⇒ SYNSEM|LOC|CAT " SUBJ hi COMPS hi# HD-DTR *word* SYNSEM|LOC|CAT " SUBJ 1 COMPS 2 , …, n # NON-HD-DTRS -SYNSEM 1 , - SYNSEM 2 , …, - SYNSEM n 

Per its name, it licenses a fully saturated phrase comprising a head – a single word – and all its valents (subject, object, and whatever else). This schema has not just been used for canonical VSO clauses within HPSG, but other clause-level head-initial structures, including polar questions in English. Thus, this schema has a long pedigree in the HPSG literature (compare the schema in (14) with Schema 3 from Pollard & Sag 1994: 40; *sai-ph* from Ginzburg & Sag 2000: 36; and *aux-initial-cxt* from Sag 2012: 188).

To see an example using the Head-All-Valents Schema, let us consider example (15) from Kimaragang:

(15) Kimaragang [kqr] (Kroeger 2010: 7) Minangalapak PST.AV.TR.split it NOM kogiw orangutan do GEN ratu. durian 'The orangutan split (open) a durian.'

By the Head-All-Valents Schema (and appropriate inherited constraints concerning the featural identities of HEAD values), a tree for (15) would be as in Figure 4. To license the tree in Figure 4, we first should observe that the verb *minangalapak* 'split' appears to require both a nominative and a genitive argument. With two such nominal expressions fitting those requirements available, the Head-All-Valents Schema can put all three of these elements – the verb and two NPs – together, and the resulting mother node's SUBJ and COMPS lists would be empty.

In spite of the flatness of Figure 4, the structure is like all head-nexus combinations in HPSG: a head and (at least some of) its dependents. In fact, Figure 4 is identical to certain verb phrases headed by a ditransitive verb (on some HPSG analyses) – just a verb and two NPs. Furthermore, the flat nature of the structure

#### 5 HPSG in understudied languages

Figure 4: The flat structure analysis of Kimaragang *Minangalapak it kogiw do ratu.* 'The orangutan split (open) a durian.'

is less of a concern than it would be under c-command-based proposals (which are the off-the-shelf analyses in Mainstream Generative Grammar): binding relations in HPSG are not calculated from the configurations within the tree, but from configurations on the ARG-ST list (see Müller 2021a, Chapter 20 of this volume). Other subject-object and agent-patient asymmetries (to the extent they exist) are likewise encoded in HPSG analyses using non-configurational data structures and do not seem to be relevant for determining constituency.

Additionally, assuming a flatter structure for VSO/verb-initial languages eases the analysis of several other phenomena (especially versus a treatment of the same data with a VP constituent). In verb-initial languages where the order of elements following the verb is flexible (as in Tongan, among others), having all arguments together with the verb as part of a single constituent allows for such "scrambling" to be analyzed with simple linear precedence constraints within that constituent, rather than having to deal with different orders across a VP boundary (see the analysis in Ball 2008: Chapter 3 for Tongan and Müller 2021b, Chapter 10 of this volume for other HPSG approaches to "scrambling"). There are also a few languages like Coast Tsimshian, where morphological marking (somewhat surprisingly) on one syntactic item refers to the next constituent over. An example of this phenomenon is given in the Coast Tsimshian sentence in (16), where the second line employs brackets to better show which elements are related to which others:

#### Douglas L. Ball

(16) Coast Tsimshian (Sm'algyax) [tsi] (Mulder 1994: 32) Yagwat huumda duusa hoon. Yagwa-t CONT-3.ERG huum-[da smell-[ERG.CN duus]-[a cat]-[ABS.CN hoon] fish] 'The cat is sniffing the fish.'

It is far more straightforward to analyze the apparent sideways relationships when the interacting elements are sisters, rather than to manage the relationships across a VP boundary (and possibly other constituent boundaries) (see Ball 2011 for an in-depth look into this syntactic phenomenon in Coast Tsimshian and an analysis of it).

# **4.3 The binary branching head-initial analysis**

Another approach in HPSG to VSO structures takes the view that all verb-headed structures within the clause are maximally binary branching, but strongly headinitial. This approach still has a strong surface-orientation – so it does not take the VSO order to be covertly SVO or SOV – but does posit that more structure is present within a clause than on the flat structure analysis.

On the binary branching head-initial analysis, VSO clauses are built out of several instances of a single rule. The rule, which I call the Head-Valent Schema, is given in (17):<sup>30</sup>

(17) Head-Valent Schema (binary branching):

*head-valent-phrase* ⇒ SYNSEM|LOC|CAT|VAL 1 HD-DTR - SYNSEM|LOC|CAT|VAL 2 ⊕ 1 NON-HD-DTRS -SYNSEM 2 

The rule in (17) allows a head to combine with just one of its valents; in particular, the first one on its VAL list. This aspect of the ordering is crucial to ensure that the subject-NP-before-object-NP sequence is licensed.

<sup>30</sup>The Head-Valent Schema here is designed to implement the Categorial Grammar analysis of Keenan (2000) in HPSG terms, and, as such, uses a single VALENCE list, abbreviated VAL. So, for the discussion in this section, I will employ this slightly different feature geometry. Note that the configuration of Figure 5 could also be achieved using the SUBJ and COMPS lists found elsewhere in this chapter (although it requires two rules instead of just one). Another option would be to include the subjects among the complements as it is done for finite verbs in Welsh (Borsley 1989: 347; 1995: 117–118) and German (Pollard 1996: 295). As has been a recurring theme throughout this chapter, many analyses are possible and more empirical work is needed to see which might be preferred.

#### 5 HPSG in understudied languages

Returning to the Kimaragang example of (15), we can see how a structure licensed by the Head-Valent Schema in (17) differs from a structure licensed by the Head-All-Valents Schema. The structure licensed by the Head-Valent Schema (and relevant inherited constraints) is given in Figure 5.

Figure 5: The binary branching head-initial analysis of the Kimaragang example in (15) *Minangalapak it kogiw do ratu.* 'The orangutan split (open) a durian.'

Like in Figure 4, in Figure 5, the head verb requires a nominative and a genitive argument. However, instead of combining with both of these at the same time, the verb just combines with the initial nominative argument ( 2 ), leaving the genitive argument ( 3 ) to be passed up to the mother. At this second level of structure, the Head-Valent Schema again applies – because there is still at least one element on the relevant head's VAL list – integrating 3 into the structure. The rule is barred, correctly, from applying to the root node of the tree in Figure 5, as this root node has an empty VAL list and the Head-Valent Schema requires the head daughter to have at least one valent.

A noteworthy feature of the VSO binary branching head-initial analysis is its grouping of verb and the subject NP into a constituent. Exactly this sort of thing has been reported to occur in some verb-initial languages, like Malagasy (Keenan 2000), suggesting the binary branching head-initial analysis might be preferable

#### Douglas L. Ball

for such languages. In VSO languages without such evidence, it would seem that either the flat structure analysis or the binary branching head-initial analysis would be possible, all else being equal.

If one is accustomed to seeing the trees from Mainstream Generative Grammar, the structure in Figure 5 may still seem strange (notably, the structural prominence relationships between what seems to be the subject NP and what seems to be the object NP are reversed). Nevertheless, many of the same kinds of comments made for the flat structure analysis hold here as well. The structure in Figure 5 is a just a series of head-argument structures, the most common kind of structure in HPSG. And, once again, the non-configurational approach to binding in HPSG renders any issues related to tree configuration and binding as irrelevant.

Both approaches to VSO order discussed above do raise interesting questions about whether there are any underlying grammatical principles, processing preferences, or historically-driven outcomes behind the patterns. For the binary branching head-initial analysis, there is a question as to why the required order of combination goes from from least oblique to most oblique. A similar set of question can be leveled to the flat structure analysis: what inhibits a more constituent-rich structure? Why are flat structures licensed here and not elsewhere? To my knowledge, these questions have yet to be tackled within the HPSG literature, but they do seem to be reasonable next steps, in addition to better seeing which analyses are appropriate for which verb-initial languages.

# **5 Wrapping up**

In general, HPSG practitioners have been fairly conservative in what they assume to be universal in syntax: since there is no core assumption in HPSG that particular rich, innate, and universal class of structures help children learn any language (Mainstream Generative Grammar's Universal Grammar), proposals can be (and are) made that are agnostic as to universality. Even so, the brief trip made in this chapter through argument indexing, non-accusative alignments, and verb-initial constituent order found in understudied languages reveals that the more dependency-oriented portions of the framework – in particular, the areas encoded in the SUBJ, COMPS, and ARG-ST lists – are useful for the analysis of all three of these areas, across different languages, and, thus, are candidates for universality<sup>31</sup> (though the current level of understanding does not clearly point

<sup>31</sup>Or in the case of the SUBJ and COMPS lists, a candidate for near-universality, as Koenig & Michelson (2015) argue that Oneida does not require such lists.

#### 5 HPSG in understudied languages

to them originating from either a rich language-specific part of cognition or from general cognition). Furthermore, the explorations above show that the rich and precise modeling using attribute-value matrices also allows for uniform sorts of analyses, even though the details may differ.<sup>32</sup> While the precise attributes and feature values may not completely be candidates for universality, they certainly aid in the enterprise of exploring different analyses and determining what precisely must be said to capture certain linguistic phenomena.

In addition to revealing some of the more uniform aspects of HPSG, the above discussion also reveals a certain flexibility in how the framework can be deployed – several analyses might be possible and certain ones might be more appropriate for certain languages and not for others. Thus, on top of a uniform foundation, various languages and phenomena are open to be analyzed in their own terms, dependent on what the specific empirical facts reveal. This mesh of uniformity and parochiality in HPSG analyses seems to strike a good balance as grammarians try to capture the two (somewhat paradoxical) realities one finds when comparing across languages: languages are both surprisingly similar and surprisingly different.

# **Acknowledgments**

Thanks to Stefan Müller, Robert Borsley, and Emily Bender for comments on this chapter. I alone am responsible for any remaining shortcomings. Thanks also to the editors, especially to Jean-Pierre Koenig (who served as the editors' in-person representative when I was first approached to write this chapter), for suggesting that a chapter like this one exist in this handbook.

# **References**

Abbott, Miriam. 1991. Macushi. In Desmond C. Derbyshire & Geoffrey K. Pullum (eds.), *Handbook of Amazonian languages*, vol. 3, 23–160. Berlin: Mouton de Gruyter. DOI: 10.1515/9783110854374.

<sup>32</sup>Two projects within the HPSG community have explored in-depth how uniform particular HPSG analyses of different languages might be. The Grammar Matrix project (Bender et al. 2010) just starts from a common core and adds language-specific elements as needed; the Core-Gram project (Müller 2015) actively tries to use the same sorts of data structures for as many languages as possible within the project. Both projects develop computer-processable grammars. For more on these projects and the relation between HPSG and computational linguistics in general see Bender & Emerson (2021), Chapter 25 of this volume.

#### Douglas L. Ball


#### 5 HPSG in understudied languages

2nd edn., vol. 1, 132–223. Cambridge: Cambridge University Press. DOI: 10 . 1017/CBO9780511619427.003.


#### Douglas L. Ball


5 HPSG in understudied languages


#### Douglas L. Ball


Dixon, Robert M.W. 1979. Ergativity. *Language* 55(1). 59–138. DOI: 10.2307/412519.


#### 5 HPSG in understudied languages


#### Douglas L. Ball

*German in Head-Driven Phrase Structure Grammar* (CSLI Lecture Notes 46), 199–236. Stanford, CA: CSLI Publications.


5 HPSG in understudied languages


#### Douglas L. Ball


#### 5 HPSG in understudied languages


#### Douglas L. Ball


5 HPSG in understudied languages


#### Douglas L. Ball


# **Part II**

# **Syntactic phenomena**

# **Chapter 6**

# **Agreement**

# Stephen Wechsler

The University of Texas at Austin

Agreement is modeled in HPSG by assigning agreement features such as person, number, and gender ("phi features") to specified positions in the feature structures representing the agreement trigger and target. The locality conditions on agreement follow from the normal operation of the grammar in which those phi features are embedded. In anaphoric agreement, phi features appear on referential indices; in verb agreement, phi features appear on the verb's ARG-ST list items; and in modifier agreement, phi features appear on the MOD value of the modifier. Selective underspecification of agreement features accounts for the alternation between formal and semantic agreement. Within the HPSG framework, long-distance agreement has been analyzed as anaphoric agreement in a special clausal construction, while superficial agreement has been modeled using linearization theory.

# **1 Introduction**

Agreement is the systematic covariation between a semantic or formal property of one element (called the agreement *trigger*) and a formal property of another (called the agreement *target*). In the sentences *I am here* and *They are here*, the subjects (*I* and *they*, respectively) are the triggers; the target verb forms (*am* and *are*, respectively) covary with them. Research on agreement systems within HPSG has been devoted to describing and explaining a number of observed aspects of such systems. Regarding the grammatical relationship between the trigger and the target, we may first of all ask how local that relationship is, and in what grammatical terms it is defined. Having determined the prevailing locality conditions on agreement in a given language, we attempt to explain observed exceptions, that is, cases of apparent "long-distance agreement", as well as cases

Stephen Wechsler. 2021. Agreement. In Stefan Müller, Anne Abeillé, Robert D. Borsley & Jean- Pierre Koenig (eds.), *Head-Driven Phrase Structure Grammar: The handbook*, 219–244. Berlin: Language Science Press. DOI: 10.5281/ zenodo.5599828

### Stephen Wechsler

of superficial agreement defined on string adjacency. Agreement features across languages include person, number, and gender (known as *phi* features), as well as deictic features and case, but various different subsets of those features are involved in particular agreement relations. How can we explain the distribution of features? How are locality and feature distribution related to the diachronic origin of agreement systems? Also, as indicated in the definition of agreement provided in the first sentence of this paper, the features of the target are sometimes determined by the trigger's form and sometimes by its meaning. What regulates this choice? In some cases a single trigger in a sentence determines different features on two different targets. Why does such "mixed agreement" exist, and what does its existence tell us about the grammatical representation of agreement? This chapter reviews HPSG approaches to these questions of locality, grammatical representation, feature distribution, diachrony, semantic versus formal agreement, and mixed agreement. Agreement with coordinate phrases is discussed by Abeillé & Chaves (2021: Section 4.2), Chapter 16 of this volume.

HPSG offers an integrated account of these phenomena. In most cases the analysis of agreement phenomena does not involve any special formal devices dedicated for agreement, comparable to the *probe* and *goal*, or the AGREE relation, found in Minimalist accounts (Chomsky 2000). Instead, the observed agreement phenomena arise as a side effect of other grammatical mechanisms responsible for valence saturation, the semantics of modification, and coreference.

# **2 Modeling agreement relations**

Constraint-based formalisms such as HPSG are uniquely well-suited for modeling agreement. Within such formalisms, agreement occurs when multiple feature sets arising from distinct elements of a sentence specify information about a single abstract object, so that the information must be mutually consistent (Kay 1984). The two forms are said to agree when the values imposed by the two constraints are compatible, while ungrammaticality results when they are incompatible. For example the English verb *is* in (1) specifies that its initial ARG-ST list item,<sup>1</sup> which is identified with the SUBJ list item, has third person, singular features. In the mechanism of valence saturation, the NP list item in the value of SUBJ unifies with the feature description representing the SYNSEM value of the subject NP. The features specified by the verb for its subject and by the subject NP must be compatible; otherwise the representation for the resulting sentence is ill-formed, predicting ungrammaticality as in (3a).

<sup>1</sup>See Abeillé & Borsley (2021: Section 4), Chapter 1 of this volume for an introduction covering argument structure (ARG-ST) and valence features. Davis, Koenig & Wechsler (2021), Chapter 9 of this volume deal more intensively with ARG-ST and linking.

6 Agreement

(1) Simplified lexical sign for the verb *is*:


(2) Simplified lexical signs for *I* and *she*:


(3) a. \* I is sober.

b. She is sober.

The features supplied by the trigger and target must be consistent, but there is no general minimum requirement on how many features they specify. Both of them can be, and typically are, underspecified for some agreement features. For example, gender is not specified by the verb in (1) or the first pronoun in (2).

The representation of an agreement construction is the same regardless of whether a feature originates from the trigger or the target. This immediately accounts for common agreement behavior observed when triggers are underspecified (Barlow 1988). For example, Serbo-Croatian is a grammatical gender language, where common nouns are assigned to the masculine, feminine, or neuter gender. The noun *knjiga* 'book' in (4) is feminine, so the modifying determiner and adjective appear in feminine form (Wechsler & Zlatić 2003: 4).

(4) Ov-a this-NOM.F.SG star-a old-NOM.F.SG knjig-a book(F)-NOM.SG stalno always pad-a. fall-3SG (Serbo-Croatian) 'This old book keeps falling.'

However, some nouns are unspecified for gender, such as *sudija* 'judge'. Interestingly, the gender of an agreeing adjective actually adds semantic information, indicating the sex of the judge (Wechsler & Zlatić 2003: 42, example (23)).


### Stephen Wechsler

b. Ta that.F stara old.F sudija judge je AUX dobro well sudila. judged.F 'That old (female) judge judged well.'

Here the gender feature comes from the targets instead of the trigger. This illustrates an advantage of constraint-based theories like HPSG over transformational accounts in which a feature is copied from the trigger, where it originates, to the target, where it is then realized. The usual source of the feature (the noun) lacks it in (5), a problem for the feature-copying view.

The same problem occurs even more dramatically in *pro*-drop languages. Many languages allow subject pronouns to drop, and distinguish person, number, and/ or gender on the verb. If those features originate from the null subject, then there would have to be distinct null pronouns, one for each verbal and predicate adjective inflection (Pollard & Sag 1994: 64). This would be more complex and stipulative, and moreover the paradigm of putative null pronouns would have to exactly match the set of distinctions drawn in the verb and adjective systems, rather than reflecting the pronoun paradigm. HPSG avoids this suspicious assumption. Null anaphora is modeled by allowing the *pro*-dropped argument to appear on the ARG-ST list but not a valence list like SUBJ or COMPS (see Davis, Koenig & Wechsler 2021, Chapter 9 of this volume). For example, in the context given in (6) a Serbo-Croatian speaker could omit the subject pronoun.

(6) Context: Speaker comes home to find her bookcase mysteriously empty. Gde where su did (one) they.F.PL nestale? disappear.F.PL 'Where did they (i.e. the books) go?'

The sign for the inflected participle specifies feminine plural features on the initial item in its ARG-ST list. The SUBJ list item is optional:<sup>2</sup>

(7) Simplified lexical sign for the participle form *nestale*:

 PHON *nestale* SUBJ 1 COMPS hi ARG-ST <sup>1</sup> NP NUM *pl* GEN *fem* 

<sup>2</sup>See also Müller & Ghayoomi (2010: 465) for an analysis along these lines for *pro*-drop in Persian.

### 6 Agreement

The feminine plural features are specified regardless of whether the subject pronoun appears. When the pronoun is dropped we have the usual underspecification, only in this case the trigger does not exist, so it is effectively fully underspecified, realizing no features at all.

# **3 Locality in agreement**

# **3.1 Argument and modifier agreement**

In HPSG, the grammatical agreement of a predicator with its subject or object, or an adjective, determiner, or other modifier with its head noun, piggy-backs on the mechanism of valence saturation and modification. Agreement is encoded in the grammar by adding features of person, number, gender, case, and deixis to the existing feature descriptions involved in syntactic and semantic composition. This simple assumption is sufficient to explain the broad patterning of distribution of agreement, in contrast to the transformational approach where complex locality conditions must be stipulated (see also Borsley & Müller 2021: Section 3.3, Chapter 28 of this volume).

In HPSG, predicate-argument agreement arises directly from the valence saturation, as illustrated already in (1) above. Thus the locality conditions on the trigger-target relation follow from the conditions on the subject-head or complement-head relation. Similarly, attributive adjectives agree with nouns directly through the composition of the modifier with the head that it selects via the MOD feature. For example, the Serbo-Croatian feminine adjective form *stara* 'old.F' in (5b) specifies feminine singular features for the common noun phrase (N<sup>0</sup> ) that it modifies, which is captured by the representation in (8):

(8) Simplified lexical sign for *stara* 'old.F':

```

PHON 

       stara
MOD

       HEAD noun
       COMPS hi
       NUM sg
       GEND fem
```
In head-adjunct phrases, the MOD value of the adjunct daughter is token-identical with the SYNSEM value of the head daughter. So *stara*'s feminine singular features cannot conflict with the features of the noun it modifies (see also Van Eynde 2021: Section 2.1, Chapter 8 of this volume).

### Stephen Wechsler

The predicted locality conditions are also affected by the percolation of features from words to phrasal nodes, and this depends on the location of the features within the feature description. Agreement features of the *trigger* appear either within the HEAD value or the semantic CONTENT value (these give rise to CONCORD and INDEX agreement, respectively; see Section 4.2). In either case these features percolate from the trigger's head word to its maximal phrasal projection, due to the Head Feature Principle (Abeillé & Borsley 2021: 22, Chapter 1 of this volume) in the former case and the Semantics Principle (Koenig & Richter 2021, Chapter 22 of this volume) in the latter. For example the noun phrase *the books* shares its NUM value with the NUM value of its head *books* and hence it is *pl*. This determines plural agreement on a verb: *These books are/\*is interesting.* Apparent exceptions, where a target seems to fail to agree with the head of the trigger, are discussed on p. 233 below.

However, agreement features of the *target* appear in neither the HEAD nor the CONTENT value of the target form, but rather appear embedded in an ARG-ST list item or MOD features. So agreement features of the target do not project to the target's phrasal projection such as VP, S, or AP. This is a welcome consequence. If the subject agreement features of the verb projected to the VP, for example, we would expect to find VP-modifying adverbs that consistently agree with them, but we do not.<sup>3</sup>

# **4 Varieties of agreement target**

# **4.1 Anaphoric agreement**

In anaphoric agreement, an anaphoric pronoun agrees in person, number, and gender with its antecedent. Since Pollard & Sag (1992; 1994), anaphoric agreement has been analyzed in HPSG by assuming that person, number, and gender are formal features of the referential index associated with an NP. Anaphoric binding in HPSG is modeled as coindexation, i.e. sharing of the INDEX value, between the binder and bindee. Thus any specifications for agreement features of the INDEX contributed by the binder and bindee must be mutually consistent. In (9), Principle A of the Binding Theory (Müller 2021a, Chapter 20 of this volume) requires the reflexive pronoun to be coindexed with an o-commanding item, here the subject pronoun:

<sup>3</sup>VP-modifying secondary predicates sometimes agree with their own subjects. What we do not find are adjuncts that consistently agree with the subject agreement features of the VP even when the adjunct is not predicated of that subject.

### 6 Agreement

(9) a. She admires herself.

b. *admire*:

$$
\begin{bmatrix} \text{N} \\\\ \text{ARG-ST} \\\\ \end{bmatrix} \text{NP:} \left[ \begin{bmatrix} \text{p} \text{pro} \\\\ \text{N} \text{DEX} \\\\ \text{GEN:} \end{bmatrix} \begin{bmatrix} \text{p} \text{ER} & 3rd \\\\ \text{NUM sg} \\ \text{GEN:} \text{ fem} \end{bmatrix} \right], \text{NP:} \left[ \begin{bmatrix} \text{ana} & 3rd \\\\ \text{N} \text{DEX} & \boxed{\Box} \\\\ \text{GEN:} \text{ fem} \end{bmatrix} \right] \Bigg)^{-1}
$$

The agreement features are formal features and not semantic ones, but the semantic correlates of person (speaker, addressee, other), number (cardinality), and gender (male, female, inanimate, etc.) are invoked under certain conditions (described in Section 5). ThusINDEX agreement is distinct from *pragmatic agreement* whereby semantic features of two coreferential expressions must be semantically consistent in order for them to refer to a single entity. INDEX agreement is enforced only within the syntactic domain defined by Binding Theory, while pragmatic agreement applies everywhere. For example, feminine pronouns are sometimes used for ships, in addition to neuter pronouns. Whichever gender is chosen, it must be consistent in binding contexts (example based on Pollard & Sag's 1994: 79 example (46a)):

	- b. The ship lurched, and then she righted herself. It is a fine ship.
	- c. \* The ship lurched, and then it righted herself.
	- d. \* The ship lurched, and then she righted itself.

The bound reflexive must agree formally with its antecedent, while other coreferential pronouns need not agree, as they are not coarguments of the antecedent and not subject to the structural Binding Theory.

In grammatical gender languages, where common nouns are conventionally assigned to a gender, an anaphoric pronoun appearing outside the binding domain of its antecedent can generally agree with that antecedent either formally or, if it is semantically appropriate (such as an animate, sexed entity), it can alternatively agree pragmatically. In most situations pronouns allow either pragmatic or INDEX agreement with their antecedents. For example, pronouns coreferential with the Serbo-Croatian grammatically neuter diminutive noun *devojče* 'girl' can appear in either neuter or feminine gender (from Wechsler & Zlatić 2003: 198):<sup>4</sup>

<sup>4</sup>See also Müller (1999: Section 20.4.4.) for a discussion of similar cases in German and of problems for HPSG's Binding Theory.

### Stephen Wechsler

	- a. Ono it.N.SG je AUX.SG htelo wanted.N.SG da that telefonira. telephone
	- b. Ona she.F.SG je AUX.SG htela wanted.F.SG da that telefonira. telephone 'This little girl came in. She wanted to use the telephone.'

The neuter pronoun in (11a) reflects INDEX agreement with the antecedent while the feminine pronoun (11b) reflects its reference to a female (pragmatic agreement). But when a reflexive pronoun is locally bound by a nominative subject, agreement in formal INDEX features is preferred:

(12) Devojče girl.NOM.N.SG je AUX.3.SG volelo liked.N.SG samo own.ACC.N.SG / ?\* samu ACC.F.SG sebe. SELF.ACC 'The girl liked herself.'

Again, this illustrates INDEX agreement in the domain defined by the structural binding theory.

# **4.2 Grammatical agreement: INDEX and CONCORD**

As noted above, in HPSG agreement effectively piggy-backs on other independently justified grammatical processes. Anaphoric agreement is a side-effect of binding (Section 4.1) while grammatical agreement is a side-effect of valence saturation and modification (Section 3.1). The formal HPSG analysis of a particular agreement process mainly consists of positing agreement features somewhere in the feature description; the observed properties follow from the location of those agreement features. With regard to the location of the features, grammatical agreement bifurcates into two types, INDEX and CONCORD. 5 (The attribute name CONCORD was introduced by Wechsler & Zlatić 2000: 799, Wechsler & Zlatić 2003: 14; precursors to the idea were treated as HEAD features in Pollard & Sag 1994: Section 2.5.1, and called AGR by Kathol 1999.) The best way to understand this bifurcation of agreement, and indeed the operation of grammatical agreement

<sup>5</sup>The INDEX/CONCORD theory is sketched in Pollard & Sag (1994: Chapter 2) and Kathol (1999), and developed in detail in Wechsler & Zlatić (2000; 2003), all in the HPSG framework. It has since been adopted into LFG (King & Dalrymple 2004, inter alia) and GB/Minimalism (Danon 2011).

### 6 Agreement

systems generally, is by considering their diachronic origin. Although our primary goal is the description of synchronic grammar, a look at diachrony can help explain the forms that the grammar takes, and can also provide clues as to the best formalization of it.

Within the diachronic literature on agreement there are thought to be two different lexical sources for agreement inflections: (i) incorporated pronouns and (ii) incorporated noun classifiers (Greenberg 1978). These two sources, ultimately traced to pronouns and common nouns, give rise to INDEX and CONCORD target inflections, respectively, as explained next.

### **4.2.1 INDEX agreement**

Taking pronouns first, many grammatical agreement systems evolve historically from the incorporation of pronominal arguments into the predicates selecting those arguments, such as verbs and nouns (Bopp 1842; Givón 1976; Wald 1979, inter alia). When a phrase serving as antecedent of the incorporated pronoun is reanalyzed as the true subject or object of the predicate, the pronominal affix effectively becomes an agreement marker. With this reanalysis the only change in the affix is that it loses its ability to refer: it no longer functions as a pronoun. The affix retains its agreement features, and what was formerly anaphoric agreement with the topic becomes grammatical agreement with the subject or object. This explains why the features of grammatical agreement match those of pronominal anaphora: typically person, number, and gender, with occasional deictic features (Bresnan & Mchombo 1987: 752).

As explained above, structural anaphoric binding involves identifying (structure sharing) the referential indices of the pronoun and its binder. Therefore grammatical agreement derived from it is also INDEX agreement. For example, the signs for English *is* and *I* in (1) and (2) above should be rewritten as follows:

> 

(13) Simplified sign for *is*, illustrating INDEX agreement:


(14) Sign for *I*, illustrating INDEX features: PHON I CONTENT|INDEX 1 PER *1st* NUM *sg* 

 CONTEXT|C-INDICES|SPEAKER 1

### Stephen Wechsler

The finite verb form in (13) specifies third person singular features of its subject's referential index.

One salient distinguishing characteristic of INDEX agreement is that it includes the PERSON feature. The only known diachronic source of the PERSON feature is from pronouns. Therefore, the other type of agreement, CONCORD, lacks the PERSON feature (as we will see below).

By modeling verb agreement in a way that reflects its historical origin, we are able to explain an array of facts concerning particular agreement systems. Some of these facts and explanations are presented in Section 6 below.

### **4.2.2 CONCORD**

The agreement inflections on modifiers of nouns, such as adjectives and determiners, are thought to derive historically not from pronouns, but from noun classifiers (Greenberg 1978; Reid 1997; Seifart 2009; Grinevald & Seifart 2004, Corbett 2006: 268–269). The classifier morphemes in turn derive historically from lexical common nouns denoting superordinate categories like animal, woman, man, etc. For example Reid (1997) posits the following historical development of Ngan'gityemerri (southern Daly; southwest of Darwin, Australia), a language where the historical stages continue to cooccur in the current synchronic grammar. Originally the language had general-specific pairings of nouns as a common syntactic construction, such as *gagu wamanggal* 'animal wallaby' in (15a) (from Reid 1997: 216). The specific noun can be omitted when reference to it is established in discourse, leaving the general noun and modifier, to form NPs like *gagu kerre*, literally 'animal big' but functioning roughly like nominal ellipsis 'big one'. Then, where the specific noun is also included, both noun and modifier attract the generic term (15b). The gender markers then reduce phonologically and incorporate, producing modifier gender agreement (15c).

(15) a. Stage I: Gagu animal wamanggal wallaby kerre big ngeben-da. 1SG.SUBJ.AUX-shoot (Ngan'gityemerri) 'I shot a big wallaby.' b. Stage II: Gagu animal wamanggal wallaby gagu animal kerre big ngeben-da. 1SG.SUBJ.AUX-shoot 'I shot a big wallaby.'

6 Agreement

c. Stage III:

wa=ngurmumba male=youth wa=ngayi male=mine darany-fipal-nyine. 3SG.SUBJ.AUX-return-FOC 'My initiand son has just returned.'

If the same affix is retained on the modifiers and the noun they modify, then the result is symmetrical agreement (also known as alliterative agreement), like the feminine *-a* endings in Spanish *zona rosa* (Corbett 2006: 87–88). But often an asymmetry between the affixes on the noun and the modifiers develops: the noun affix becomes obligatory and is subject to morphophonological processes that do not affect the modifier affix (Reid 1997: 216). This process may further progress to "prefix absorption" into the common noun, as evidenced by "gender prefixed nominal roots being interpreted as stems for further gender marking" (Reid 1997: 217).

Agreement marked with inflections from such nominal sources is called *concord*, which is described using the HPSG CONCORD feature. What is the proper HPSG formalization of this type of agreement, given its provenance? The last stages of the diachronic development, described in the previous paragraph, imply that the *form* of the trigger (the noun) is influenced by the agreement features. That is, noun declension classes tend to correlate with gender assignment (and more generally, phonological and morphological characteristics of nouns correlate with gender assignment); and number is marked on nouns as well. (This close relation between declension class and CONCORD is demonstrated in detail in Wechsler & Zlatić 2003: Chapter 2.) Thus the agreement features must appear both on the head noun (to inform its form and/or its gender selection and number value) and on the phrasal projection of that noun (to trigger agreement via the MOD feature of the agreement targets). Ergo CONCORD is a HEAD feature of the trigger.

Along with the number and gender features, the CONCORD value is assumed to include the case feature when case is a feature of NPs realized on both the head noun and its modifying adjectives or determiner. CONCORD lacks the person feature, since common nouns, from which the agreement inflections on the targets derive, lack the person feature (common nouns do not distinguish person values, since they are all in the third person). Meanwhile, INDEX agreement preserves the pronominal features of person, number, and gender, reflecting its origins. In the usual case the number and gender values found in CONCORD match those found in INDEX. The Serbo-Croatian noun form *knjiga* triggers feminine singular nominative CONCORD on its adjectival possessive specifier and modifier, and third person singular INDEX agreement on the finite auxiliary. (The status of the participle is discussed below.)

Stephen Wechsler

(16) Moja my.F.NOM.SG stara old.F.NOM knjiga book(F).NOM.SG je AUX.3.SG pala.<sup>6</sup> fall.PTCP.F.SG (Serbo-Croatian) 'My old book fell.'

The nominative singular noun form *knjiga* specifies its agreement features in both CONCORD (a HEAD feature) and INDEX, with the respective values for number and gender shared:

(17) Lexical sign for *knjiga* 'book' (from Wechsler & Zlatić 2003: 18): PHON *knjiga*


The specifier (SPR) is shown as AP because the possessive phrase is categorically an adjective phrase in Serbo-Croatian. The features in the overlap between CON-CORD and INDEX are normally shared as in this example. But with some special nouns, features can be asymmetrically specified in only one of the two values (with no reentrancy linking them, of course). This leads to mismatches between CONCORD and INDEX targets, discussed in Section 6 below.

The phi features also appear within the HEAD value, as shown in (17), so that adjunct APs can agree with those features. For example, concord by the attributive adjective *stara* 'old' is guaranteed because its MOD feature is specified for feminine singular features, as shown in (8) in Section 3.1 above.

# **4.3 Conclusion**

To summarize this section, we have seen the two main historical paths to agreement, and shown how HPSG formalizes these two types of agreement so as

<sup>6</sup>Wechsler & Zlatić (2003: 18)

### 6 Agreement

to capture the syntactic and semantic properties that follow directly from their origins. Agreement that descends from anaphoric agreement of pronouns with their antecedents, through the incorporation of personal pronouns into verbs and other predicators, inherits the INDEX matching process found in the anaphoric agreement from which it descends. Agreement that descends from the incorporation of noun classifiers involves features located in the HEAD value that connect a trigger noun form to its phrasal projection. The feature sets differ for the same reason; PERSON is a feature only of the first type, and CASE only of the second. CONCORD correlates strongly with declension class, while INDEX agreement need not correlate as strongly (for evidence see Wechsler & Zlatić 2003: Chapter 2). The differences in feature sets and morphology further correlate with systematic syntactic differences, described in the following section.

# **5 Syntactic, semantic, and default agreement**

This chapter has so far focused mainly on formal agreement, as opposed to semantic agreement. But this is one of three different ways in which the form of an agreement target may be determined by a grammar:

	- a. Formal agreement: The target form depends on the trigger's formal phi features.
	- b. Semantic 'agreement': The target form depends on the trigger's meaning.
	- c. Failure of agreement: The target fails to agree and hence takes its default form.

In formal agreement, the trigger is grammatically specified for certain features as a consequence of the words making up the trigger phrase: for example a nominal may be marked for a gender as a consequence of the lexical gender of the head noun. In semantic agreement, the target is sensitive to the meaning of the trigger instead of its formal features. English number agreement can be formal as in (19), from Wechsler (2013: 92), or semantic as in (20), from McCloskey (1991: 92):

	- b. His clothing is/\*are dirty.

### Stephen Wechsler

b. That the president will be reelected and that he will be impeached are/??is equally likely at this point.

Regarding (20), McCloskey (1991: 564–565) observes that singular is used for "a single complex state of affairs or situation-type", while plural is possible for "a plurality of distinct states of affairs or situation-types". The latter sort of interpretation is facilitated by the use of the adverb *equally*. Formal and semantic gender agreement are illustrated by the French examples in (21):

	- b. Dupont Dupont est is { compétent competent.M.SG / compétente }. competent.F.SG 'Dupont { a man / a woman } is competent.'

The grammatically feminine noun *sentinelle* 'sentry' triggers feminine agreement regardless of the sex of the sentry; but in (21b) feminine agreement indicates that Dupont is female while masculine agreement indicates that Dupont is male.

How does the grammar negotiate between formal and semantic agreement? In HPSG, syntactic and semantic representations are composed in tandem, making the framework well suited to address this question. It was addressed in early HPSG work, including Pollard & Sag (1994: Chapter 1). The specific approach due to Wechsler (2011) exploits the underspecification of agreement features (see Section 2). I posit the Agreement Marking Principle (AMP), which states that target agreement features are semantically interpreted whenever the trigger is underspecified for the formal grammatical features to which the target would normally be sensitive. The subject phrases in (19) are specified for number due to the formal features of the head nouns, but those in (20) are not, as a (coordinate) clause has no grammatical source for those features. Consequently, by the AMP, the verb's number feature is semantically interpreted in (20). Similarly, *sentinelle* in (21a) gives its formal feminine gender feature to the subject, while *Dupont* lacks a gender specification, triggering the semantic interpretation of the target adjectives in (21b): feminine is interpreted as 'female'.

Agreement targets generally have a default form for use when there is no trigger or the normal agreement relation is blocked for some reason. Blocking of agreement comes about in various situations; here we consider a case where the trigger is interpreted metonymically, apparently resulting in a reassignment of

### 6 Agreement

the referential index. Swedish predicate adjectives normally agree with their subjects in number (either singular or plural) and grammatical gender, either neuter (N) or 'common' gender (COM), the gender held in common between masculine and feminine:

	- pancake-DEF.COM.SG be.PRS good.COM.SG 'The pancake is good.'
	- c. { Hus-en house-PL.DEF / Pannkak-orna pancake-PL.DEF } är be.PRS god-a. good-PL 'The houses / The pancakes are good.'

As shown in (22), a predicate adjective is inflected for number, and, in the singular, for gender, and agrees with its subject. But in sentences like (23), the adjective appears in the neuter singular form, regardless of the number and gender features of the subject. Note that *pannkakor* is the plural form of a common gender noun (Faarlund 1977; Enger 2004; Josefsson 2009):


In general, Swedish predicate adjectives appear in neuter singular when there is no triggering NP, such as with clausal subjects (see (25a) below). Wechsler & Zlatić (2003: 154) posit the index type *unm* ('unmarked') for referential indices that lack phi features, such as those introduced by verbs. So *gott* has a SUBJ list item whose index is disjunctively specified for either neuter singular or type *unm*.

The lack of agreement in (23) then arises because the subject phrase refers, not to the pancakes, but to a situation involving them; hence its referential index is distinct from the one lexically introduced by the noun *pannkakor*. A rule shifts the index and encodes the metonymic relation between the entity and the situation involving it. This is implemented with a non-branching phrasal construction in Wechsler (2013: 82):

### Stephen Wechsler

```
(24) Metonymy schema adapted from Wechsler (2013: 82):
     metonymy-phrase ⇒

      SYNSEM

               CAT NP
               CONT 
                      INDEX s
                      RESTR 
                             involve
                                     ,	
                                          ∪ 1

      DTRS *

                SYNSEM

                         CAT NP
                         CONT 
                                INDEX i
                                RESTR 1

                                           +
```
The noun *pannkakor* in (23) has an index marked with the features [PERSON *3rd*], [GENDER *com*], and [NUMBER *pl*], which, by the Semantics Principle, are therefore shared with the index of the daughter NP node in a structure licensed by rule (24). But the construction specifies the mother NP node's index is unmarked for those features, thus explaining the neuter singular adjective.

On the alternative ellipsis analysis, sentence (23) has an elliptical clausal or infinitival subject, with a structure like (25a) except that *att äta* is silent (Faarlund 1977; Enger 2004; Josefsson 2009):

	- b. Det it är be.PRS gott good.N.SG att to äta eat pannkakor. pancakes 'It is good to eat pancakes.'
	- c. \* Det it är be.PRS gott good.N.SG pannkakor. pancakes Intended: 'It is good to eat pancakes.'

But the metonymic subject behaves in all respects like an NP, and unlike a clause or infinitival phrase. For example, unlike an infinitival it resists extraposition, as shown in (25b, c). The metonymy analysis captures the fact that the subject has a clause-like meaning but not clause-like syntax.

# **6 Mixed agreement**

The two-feature (INDEX/CONCORD) theory of agreement was originally motivated by *mixed agreement*, where a single phrase triggers different features on distinct targets (Pollard & Sag 1994: Chapter 2; Kathol 1999). For example, the French

### 6 Agreement

second person plural pronoun *vous* refers to multiple addressees, and also has an honorific or polite use for a single (or multiple) addressee. When used to refer politely to one addressee, *vous* triggers singular on a predicate adjective but plural on the verb, as in (26a):


you.PL be.2PL loyal.PL 'You (plural) are loyal.' Wechsler (2011) analyzes this by adopting the following suppositions: (i) *vous* has a second person plural marked referential index; (ii) *vous* lacks phi features for CONCORD; (iii) finite verbs agree with their subjects in INDEX; and (iv) predicate adjectives agree with their subjects in CONCORD. Suppositions (i) and (iii) need

not be stipulated, as they follow from the theory: the pronoun must have INDEX phi features since it shows anaphoric agreement (when it serves as binder or bindee); and the verb must agree in INDEX since it includes the PERSON feature. By the Agreement Marking Principle (see Section 5), the (CONCORD) number and gender features of the predicate adjective are interpreted semantically, which is what is shown by example (26).

"Polite plural pronouns" of this kind are found in many languages of the world (Head 1978). The cross-linguistic agreement patterns observed in typological studies (Comrie 1975; Wechsler 2011) confirm the predictions of the theory. Taken together, suppositions (i) and (iii) from the previous paragraph entail that any person agreement targets agreeing with polite pronouns should show formal, rather than semantic, agreement. Targets lacking person, meanwhile, can vary across languages. This pattern is confirmed for all languages with polite plurals that have been surveyed, including Romance languages; Modern Greek; Germanic (Icelandic); West, South and East Slavic; Hindi; Gbaya (Niger-Congo); Kobon and Usan (Papuan); and Sakha (Turkic) (see Comrie 1975 and Wechsler 2011).

The INDEX/CONCORD distinction plays a crucial role in this account of mixed agreement. An earlier hypothesis, proposed by Kathol (1999: 230), is that French predicate adjectives are grammatically specified for semantic agreement with their subjects, while finite verbs show formal agreement. But a plurale tantum noun such as *ciseaux* 'scissors' triggers syntactic agreement on the predicate adjective:

### Stephen Wechsler

(27) Ces these.PL ciseaux scissors(M.PL) sont are.PL géniaux! brilliant.M.PL (\*génial!) brilliant.M.SG (French) 'These scissors are cool!'

As far as the syntax is concerned, *ciseaux* 'scissors' is an ordinary common noun with masculine plural CONCORD features, so it triggers those features on the adjective. More generally, agreement target types cannot be split into "formal" and "semantic" agreement targets; both formal and semantic agreement are found across all target types. Which of the two is observed for a given agreement feature depends, according to the INDEX/CONCORD theory, on whether the trigger is specified for the grammatical feature, together with the INDEX versus CONCORD status of the target.

# **7 Agreement defined on other structures**

So far our look at grammatical agreement has focused primarily on agreement defined on local grammatical relations like subject, object, and modifier. In this section we look at HPSG analyses of two other types of agreement, namely longdistance and superficial agreement.

# **7.1 Long-distance agreement**

The simple picture of locality in the previous sections is challenged by the phenomenon of long-distance agreement, where the trigger appears within a clause subordinate to the one headed by the target verb. Long-distance agreement has been observed in a number of languages, including Tsez (Nakh-Dagestanian; Polinsky & Potsdam 2001), Hindi-Urdu (Bhatt 2005), and Passamaquoddy (Athabaskan; Bruening 2001; LeSourd 2018).

Passamaquoddy long-distance agreement is illustrated by this sentence (Le-Sourd 2018: example (5)), with the relevant elements indicated in italics:

(28) N-kosicíy-a-*k* 1-know-DIR-PROX.PL [elithus-Píyel Píyel -litahási-t -think-3AN [eli-kis-ankum-í-hti-t thus-PST-sell-3/1-PROX.PL-3AN *nìkt* those.PROX *ehpíc-ik* woman-PROX.PL posonúti-yil]] basket-IN.PL (Passamaquoddy) 'I know that Píyel thinks that those women sold me the baskets.'

### 6 Agreement

The *-k* suffix on the matrix verb *kosicíy* 'know' marks plural, deictically proximate agreement with the phrase *nìkt ehpícik* 'those women' in the doubly embedded subordinate clause. LeSourd (2018) analyzes Passamaquoddy long distance agreement in the HPSG framework. He notes that Passamaquoddy long distance agreement is parallelled by long-distance raising, in which an NP in the matrix clause is coreferential with an implicit argument of a subordinate clause (LeSourd 2018: example (4)):

(29) N-kosicíy-a-*k* 1-know-DIR-PROX.PL *nìkt* those.PROX *ehpíc-ik* woman-PROX.PL [elithus-Píyel Píyel -litahási-t -think-3AN [eli-kis-ankum-í-hti-t thus-PST-sell-3/1-PROX.PL-3AN *e* posonúti-yil]] basket-IN.PL (Passamaquoddy) 'I know about those women that Píyel thinks that they sold me the baskets.'

Passamaquoddy speakers report that sentences (28) and (29) suggest the subject of 'know' (the speaker) is familiar with the women. This provides evidence that the phrase 'those women' in (29) is an argument of the matrix verb 'know', as implied by the translation. Similarly, the matrix clause (28) contains a null argument (cross-referenced by the proximate plural *-k* suffix), which is cataphoric to 'those women'. Hence a more literal translation of (28) is 'I know about them that Píyel thinks that those women sold me the baskets.'<sup>7</sup> What the long-distance agreement and raising constructions share is simply that the matrix object is coreferential with some argument contained in the subordinate clause. The following lexical entry for the verb root *kosicíy* 'know' captures that:

(30) *kosicíy* 'know':

" PHON *kosicíy* ARG-ST NP , NP , S:- RESTR …, PRD , … #

LeSourd adopts the version of HPSG described in the Sag et al. (2003) textbook, which uses a simplified Minimal Recursion Semantics. The semantic restrictions feature (RESTR) takes as its value a list of elementary predications. The list for each node is a concatenation of the restrictions of the daughter nodes. Thus every semantic argument contained within the S complement, whether overt or null, will correspond to some argument of an elementary predication in S's RESTR list. The lexical entry in (30) stipulates that the matrix object NP corefers with

<sup>7</sup>LeSourd notes that Passamaquoddy lacks Principle C effects, so cataphora of this kind is permitted.

### Stephen Wechsler

some such argument, namely the argument of the predicate PRD. In conclusion, Passamaquoddy long-distance agreement is really the anaphoric agreement of a null anaphor, cross-referenced on its verb, with an antecedent in a higher clause.

# **7.2 Superficial agreement**

In some languages, string adjacency of the trigger and target, rather than a grammatical relation such as subject or modifier, is a grammatical condition on agreement. This may arise because person agreement derives historically from pronoun incorporation, and a basic syntactic precondition for incorporation is string adjacency between the pronoun and the head into which it incorporates (Givón 1976; Ariel 1999; Wechsler et al. 2010; Fuß 2005). If the trigger occupies the syntactic position that the pronoun occupied prior to incorporation (for example because the trigger is itself a pronoun) then the result is that trigger and target are adjacent. For example, West Flemish complementizers agree with an immediately following subject, even though the complementizer and subject are not related by any grammatical relation (Haegeman 1992). To take another example, Borsley (2009) analyzes Welsh superficial agreement in the HPSG framework, citing examples like the following:

(Welsh)

	- b. arno on.3SG.M fo he
		- 'on him'
	- c. Gweles see.PAST.1SG i I a and Megan Megan geffyl. horse 'Megan and I saw a horse.'

The trigger is the subject in (31a), object in (31b), and the first conjunct of a coordinate subject in (31c). But in every case, "An agreeing element agrees with an immediately following noun phrase if and only if the latter is a pronoun" (Borsley 2009: 237). Borsley (2009: 257) expresses this as an HPSG implicational constraint using the DOMain feature from linearization theory (Reape 1994; Müller 1995; 1999; Kathol 2000; see also Müller 2021b: Section 6, Chapter 10 of this volume):

(32) [DOM [AGR <sup>1</sup> ], NP: <sup>2</sup> , … ] ⇒ 1 = 2

The DOMain list encodes linear precedence between constituents that are not necessarily sisters. In (32) the AGR value is the set of phi features of the target; the colon following NP represents the semantic CONTENT attribute; and the subscripted tag 2 is the INDEX value. The rule states that when a constituent bearing the AGR attribute is immediately followed by a personal pronoun (content of type *ppro*), then the AGR value is identified with the pronoun's index (shown here as 2 ), that is, it agrees with a right-adjacent pronoun.

# **8 Conclusion**

Agreement is analyzed in HPSG by assigning phi features to specific locations in the feature descriptions that make up the grammar. Anaphoric agreement results from phi features appearing on the referential indices of the binder and bindee, together with the assumption that binding consists of the identification of those indices. Verbal agreement with subjects and objects results when phi features appear on the verb's ARG-ST list items that are identified with the SYNSEM values of the subject and object phrases. Modifier agreement with heads occurs when phi features appear within the MOD value of the modifier. According to the IN-DEX/CONCORD theory, when agreement is historically descended from anaphoric agreement of incorporated pronouns, then those features within the ARG-ST list or MOD items are located on the referential index; while otherwise they are collected in the CONCORD feature and placed within the value of the HEAD features. The locality conditions on agreement follow from the normal operation of the grammar in which those phi features are embedded. Some cases of agreement seem to exist outside those conditions. Long-distance agreement has been analyzed as a kind of anaphoric agreement within a prolepsis construction, and superficial agreement has been defined on string adjacency and precedence, within linearization theory.

# **Abbreviations**

AN animate


# **Acknowledgments**

For their insightful comments on earlier drafts of this chapter, I would like to thank (in alphabetical order): Anne Abeillé, Robert Borsley, George Aaron Broadwell, Jean-Pierre Koenig, Antonio Machicao y Priemer and Stefan Müller. All of the reviews were valuable and together they greatly improved the paper.

Stephen Wechsler

# **References**


6 Agreement


### Stephen Wechsler


### 6 Agreement


### Stephen Wechsler


# **Chapter 7**

# **Case**

# Adam Przepiórkowski

University of Warsaw and Polish Academy of Sciences

The aim of this chapter is to provide an outline of HPSG work on grammatical case. Two issues that attracted much attention of HPSG pracitioners in the 1990s and early 2000s are the locality of case assignment, especially so-called structural case assignment, as well as case syncretism and underspecification; they are discussed in two separate sections. The final section summarises other work on case carried out within HPSG, including some computational efforts, as well as investigations of case phenomena at the syntax-semantics interface and at the border of syntax and morphology.

# **1 Introduction**

HPSG is not widely known for its approach to grammatical case. For example, it is only mentioned in passing in the 2006 monograph *Theories of Case* (Butt 2006: 225) and in the 2009 *Oxford Handbook of Case* (Malchukov & Spencer 2009: 43), which features separate articles on GB/Minimalism, Lexical Functional Grammar, Optimality Theory and other grammatical frameworks. As most of the HPSG work on case was carried out in the 1990s and early 2000s, this perception is unlikely to have changed since the publication of these two volumes.

The aim of this chapter is to provide an overview of HPSG work on grammatical case and to show that it does offer novel solutions to some of the problems related to case. Two main research areas are presented in the two ensuing sections: structural case assignment is discussed in Section 2 and case syncretism and underspecification in Section 3. Some of the other HPSG work on case, including implementational work, is outlined in Section 4.

Adam Przepiórkowski. 2021. Case. In Stefan Müller, Anne Abeillé, Robert D. Borsley & Jean- Pierre Koenig (eds.), *Head-Driven Phrase Structure Grammar: The handbook*, 245–274. Berlin: Language Science Press. DOI: 10.5281/zenodo. 5599830

# **2 Structural case assignment**

Pollard & Sag (1994) did not envisage a separate theory of case:<sup>1</sup> "Nominative case assignment takes place directly within the lexical entry of the finite verb", while "the subject SUBCAT element of a nonfinite verb […] does not have a case value specified" (p. 30). However, they added in a footnote on the same page that "for languages with more complex case systems, some sort of distinction analogous to the one characterized in GB work as 'inherent' vs. 'structural' is required."

In the transformational Government and Binding theory of the 1980s (GB; Chomsky 1981; 1986), "inherent" – or "lexical" – case is understood as rigidly assigned by the head and independent of syntactic environment, while "structural" case varies with the structural context (e.g., Haider 1985: 70). This difference can be illustrated on the basis of the following examples from German (Przepiórkowski 1999a: 63, based on data from Heinz & Matiasek 1994):


the helping the plumber.GEN 'the help from/\*for the plumber'

<sup>1</sup>This section is to some extent based on Przepiórkowski (1999a: Section 3.4 and Chapter 4); see also Müller (2013: Chapter 14).

### 7 Case

In (1), both arguments of the verb UNTERSTÜTZEN<sup>2</sup> 'support' receive structural case: the patient argument occurs in the accusative in (1a), in the nominative in (1b), and in the genitive in (1c). Similarly, the agent argument is in the nominative in (1a), but it may only occur in the genitive in (1c); hence, the single argument marked as genitive in (1c) is ambiguous between the agent and the patient. In the case of (2), the agent argument of HELFEN 'help' is similarly assigned structural case, but the patient argument receives a rigid inherent case: it is always the dative, so, e.g., the genitive in (2c) may only be understood as marking the agent.

Such examples may still be handled without any general principles of case assignment. For example, lexical rules (Pollard & Sag 1987: 209–218) responsible for forming passive participles (as in the b. examples above) and nominalisations (as in the c. examples) might be responsible for manipulating case values of arguments, e.g., for translating nominative and accusative – but not dative – to genitive in the case of nominalisations. However, the interaction of the structural/inherent case dichotomy with raising (and – in some languages – with control) motivates a more comprehensive approach to case assignment.

Consider Icelandic raising verbs (all Icelandic data is taken from Sag et al. 1992: 304–305):

(3) a. *Hann* he.NOM virðist seems elska love.INF hana. her.ACC 'He seems to love her.'

(Icelandic)

b. Þeir they telja believe *Maríu* Mary.ACC hafa have.INF skrifað written ritgerðina. the.thesis 'They believe Mary to have written her thesis.'

As in other languages, the subject of the infinitival verb raised to the higher subject position, as in (3a), normally receives the nominative case there, while – in case it is raised to the object position, as in (3b) – it normally receives accusative case. This could be easily modelled in accordance with the suggestion of Pollard & Sag (1994: 30) that infinitival verbs do not assign case to their subjects, while finite verbs – in this case finite raising verbs – normally assign nominative to their subjects and accusative to their objects. But, as is well known (Andrews 1982; Zaenen & Maling 1983; Zaenen et al. 1985), some Icelandic verbs idiosyncratically assign specific "quirky" cases to their subjects, and when they do, the higher raising verbs must honour this assignment:

<sup>2</sup>Note the convention of using small capitals to typeset lemmata.

	- b. Hann he.NOM telur believes *mig* me.ACC vanta lack.INF peninga. money 'He believes that I lack money.'

(Icelandic)

	- b. Hann he telur believes *verkjanna* the.pains.GEN ekki not gæta. be.noticeable.INF 'He believes the pains to be not noticeable.'

Thus, in (4), the understood subject of the infinitival VANTA 'lack' must be in the accusative, whether it is raised to the object position, as in (4b), where the accusative would be expected anyway, or to the subject position, as in (4a), where normally the nominative would be expected. This works similarly in the case of verbs idiosyncratically assigning their subject the dative case, as in (5), or the genitive case, as in (6).

The difficulty presented by such examples is this. If finite raising verbs were assumed to assign case to the raised subjects – nominative in the case of raising to subject and accusative in the case of raising to object – then this would clash with "quirky" cases assigned to their subjects by some verbs: (4a), (5) and (6) would be predicted to be ungrammatical. If, on the other hand, such raising verbs did not assign case to the raised arguments, instead relying on the lower verbs to assign appropriate cases to their subjects, then it is not clear what case should be assigned to their subjects by the usual – not "quirky" – verbs: it cannot always be the nominative, as the accusative is witnessed when the subject is raised to the object position, as in (3b); similarly, it cannot always be the accusative, as the nominative surfaces when the subject is raised to the subject position, as in (3a).

### 7 Case

The intuition of the analysis proposed in Sag et al. (1992) relies on the distinction between structural and inherent case assignment, although these terms do not appear in that paper. Verbs such as those in (4)–(6) assign their subjects specific inherent cases (accusative in (4), dative in (5) and genitive in (6)), while the usual verbs, as in (3), only mark their subjects as structural, to be assigned case elsewhere. Finite raising verbs are, in a way, sensitive to this distinction, and only assign the nominative (in the case of raising to subject) or accusative (in the case of raising to object) to such structural arguments. While Sag et al. (1992) represent this distinction between structural and inherent case implicitly, via the interaction of two attributes, CASE (realised case) and DCASE (default case), later HPSG work assumes explicit representation of the two kinds of case as two subtypes of *case* in the type hierarchy: *str*(uctural) and *lex*(ical). Such a *case* type hierarchy is, apparently independently, alluded to in Pollard (1994) and introduced in detail in Heinz & Matiasek (1994), to which we turn presently.

On the basis of German examples such as (1)–(2), Heinz & Matiasek (1994) argue that out of four morphological cases in German – nominative, accusative, genitive and dative – the first three (i.e., with the exception of the dative) may be assigned structurally, by general case assignment principles. Similarly, they argue that the last three (i.e., apart from the nominative) may also be assigned lexically, in which case they are stable across various syntactic environments. These empirical observations are translated into the *case* hierarchy in Figure 1.

Figure 1: Heinz & Matiasek's (1994: 207) case hierarchy for German encoding the structural/lexical distinction

Particular verbs may assign specific lexical cases to their arguments, e.g., *ldat*. They may also specify arguments as bearing structural case, in which case only the *str*(*uctural*) supertype is mentioned in the lexicon. For example, the lexi-

cal entries for UNTERSTÜTZEN 'support' and HELFEN 'help' contain the following subcategorisation requirements:

(7) a. UNTERSTÜTZEN: [SUBCAT h NP[*str*], NP[*str*] i] b. HELFEN: [SUBCAT h NP[*str*], NP[*ldat*] i]

Assuming a similar *case* hierarchy for Icelandic, the difference between the usual verbs, such as ELSKA 'love' in (3a), and "quirky" subject verbs, such as VANTA 'lack' in (4), could be represented as below (omitting non-initial arguments):

	- b. VANTA: [SUBCAT h NP[*lacc*], …i]

Since Pollard (1994) and Heinz & Matiasek (1994), such representations of case requirements are generally adopted in HPSG,<sup>3</sup> with the only difference that SUBCAT is currently replaced with ARG-ST. The point where different approaches diverge is how exactly structural case is resolved to a specific morphological case.

The simplest principle would resolve the case of the first *str* argument of a pure (non-gerundial) verb to nominative, i.e., to *snom*, the case of any subsequent *str* argument of a pure verb to accusative, i.e., to *sacc*, and the case of any *str* argument of a nominalisation to *sgen*. Unfortunately, this simple principle would not work in various cases of raising, e.g., in the case of the Icelandic data above. While the "quirky" cases in (4)–(6) would be properly taken care of by this approach – once the subject is assigned a specific lexical case it is outside of the realm of a principle resolving structural cases – structural subjects raised to a higher verb would be assigned specific case twice (or more times, in the case of longer raising chains): on the SUBCAT (or ARG-ST) of the lower verb and on the SUBCAT (or ARG-ST) of the raising verb.<sup>4</sup> This would not necessarily lead to problems in the case of raising to subject verbs, as in (3a), as the structural argument would be the subject in both subcategorisation frames, so its case would be resolved to *snom* twice, but it would create a problem in the case of raising to object verbs, as in (3b), as the case of the raised argument would be resolved to the nominative on the lower subcategorisation frame and to the accusative on the higher frame. So, the problem is not limited to Icelandic, but may be observed in any language with raising to object (also known as Exceptional Case Marking or Accusativus cum Infinitivo or AcI), including German (cf., e.g., Heinz & Matiasek 1994: 231): if a structural argument occurs on a number of SUBCAT

<sup>3</sup>Recent examples being Machicao y Priemer & Fritz-Huechante (2018: 169) and Müller (2018: Chapter 7.2.1).

<sup>4</sup>See Abeillé 2021, Chapter 12 of this volume, on the analysis of raising in HPSG.

7 Case

or ARG-ST lists, it should be assigned specific morphological case according to its position on just one of them – the highest one.

Both Pollard (1994) and Heinz & Matiasek (1994) account for such facts via configurational case principles, e.g. Heinz & Matiasek (1994: 209):

(9) Case Principle (for German):

In a *head-complement-structure* whose head has category *verb*[*fin*] the external argument has a CASE value of *snom*, *verb* the internal argument has a CASE value of *sacc*, *noun* the internal argument has a CASE value of *sgen*. These are the only saturated or almost saturated *head-complement-structure*s with structural arguments.


Heinz & Matiasek (1994: 209–210) formalise this Case Principle by giving the following constraints:

$$\begin{aligned} \text{(12)} \quad & \begin{bmatrix} \text{SYNSEM}[\text{LOC}[\text{CAT}\begin{^{\text{HEAD}}}\_{\text{UE-CH}}\text{}\begin{^{\text{left}}}\_{\text{VFORM }f\text{ft}}] \\ \text{SUBCAT }\langle\rangle \end{^{\text{left}}\_{\text{VFORM }f\text{ft}} \end{^{\text{right}}} \\\\ \text{DTRS} \begin{^{\text{InE-CD}}}\_{\text{HEAD-DTR}[\dots |\text{SUBACAT }\langle\text{NP}[\text{srt}],\dots \rangle ] \end{^{\text{in}}} \end{^{\text{in}}} \end{aligned} } \Rightarrow \\\\ \begin{bmatrix} \text{DTRS}[\text{HEAD}\text{-DTR}[\dots |\text{SUBACAT }\langle\text{NP}[\text{ssonm}],\dots \rangle] \\\\ \text{(13)} \begin{bmatrix} \text{HEAD} & \begin{bmatrix} \text{vrb} \\ \text{VFORM }f\text{ft} \end{bmatrix} \\\\ \text{DTRS} \begin{Bmatrix} h \text{-c.t} \\ \text{HEAD-DTR}[\dots |\text{SUBACAT }\langle\rangle \text{synsem},\text{NP}[\text{stl}],\dots \rangle \end{bmatrix} \end{bmatrix} \Rightarrow \\\\ \begin{bmatrix} \text{DTRS}[\text{HEAD-DTR}[\dots |\text{SUBACAT }\langle\rangle \text{synsem},\text{NP}[\text{stl}],\dots \rangle] \\\\ \text{[DTRS]}[\text{HEAD-DTR}[\dots |\text{SUBACAT }\langle\text{synsem},\text{NP}[\text{ssonm}],\text{NP}[\text{sacc}] \rangle], \dots \end{bmatrix} \end{aligned} } \end{aligned} $$

$$\begin{array}{c} \text{(14)} \quad \begin{bmatrix} \text{SYNSEM}|\text{LOC}|\text{CAT} & \text{num} \\ \text{SUBCAT} \left\langle \rangle \vee \left\langle \text{synsem} \right\rangle \right\rangle \\\\ \text{DTRS} \left[ \begin{smallmatrix} \text{h-c-str} \\ \text{HEAD-DTR}|\dots \left| \text{SUBCAT} \left\langle \text{synsem}, \text{NP} [str], \dots \right\rangle \right\rangle \right] \end{array} \right] \Rightarrow \\\\ \begin{bmatrix} \text{DTRS}|\text{HEAD-DTR}|\dots \left| \text{SUBCAT} \left\langle \text{synsem}, \text{NP} [syn], \dots \right\rangle \end{bmatrix} \end{array}$$

Note that the locus of this Case Principle is *phrase* and that it makes reference to *head-complement-structure* values of the DAUGHTERS (DTRS) attribute. In this sense, this principle is configurational. Similar principles were proposed for Korean (Yoo 1993; Bratt 1996), English (Grover 1995) and Polish (Przepiórkowski 1996a), *inter alia*.

This configurational approach to case assignment is criticised in Przepiórkowski (1996b; 1999a,b) on the basis of conceptual and theory-internal problems. The conceptual problem is that a configurational analysis is employed for what is usually considered an essentially local phenomenon, one concerned with the relation between a head and its dependents (Blake 1994). The – more immediate – theory-internal problem is that such configurational case principles are restricted to locally realised arguments, and are not necessarily compatible with those – dominant since Pollard & Sag (1994: Chapter 9) – HPSG analyses of extraction which do not assume traces and with those HPSG approaches to cliticisation in which the clitic is realised as an affix rather than as a tree-configurational constituent (cf., e.g., Miller & Sag 1997 on French and Monachesi 1999 on Italian).

The solution proposed in Przepiórkowski (1996b; 1999a,b) is to resolve structural cases directly within ARG-ST, via local principles operating at the level of the *category* of a word (where both head information and argument structure information – but not constituent structure – are available) rather than at the level of *phrase*. This seems to bring back the problem, discussed in connection with the Icelandic data above, of raised arguments, which occur on a number of ARG-ST lists. The innovation of Przepiórkowski (1996b; 1999a,b) is the proposal to mark, within ARG-ST, whether a given argument is realised locally (either treeconfigurationally, or as a gap to be filled higher on, or as an affix) or not. If it is realised locally, it may be assigned appropriate case; if it is not (because it is raised), its structural case must be resolved higher up. On this setup, the above constraints (12)–(13) responsible for the assignment of structural nominative and accusative are replaced with the following two constraints (and similarly for the structural genitive):<sup>5</sup>

<sup>5</sup>The antecedents of such principles could be further constrained to apply to *word*s only. As usual, '⊕' indicates concatenation of lists.

7 Case

$$\begin{aligned} \text{(15)} \quad & \begin{bmatrix} \text{HEAD} & \text{verb} \\ \text{ARG-ST} \left\{ \begin{bmatrix} \text{ARG-NP}[str] \\ \text{READZED} + \text{+} \end{bmatrix} \right\} \oplus \begin{bmatrix} \text{ARG-ST} \left\{ \begin{bmatrix} \text{ARG-NP}[smom] \end{bmatrix} \right\} \oplus \box{\boxplus} \end{bmatrix} \\ \text{(16)} \quad & \begin{bmatrix} \text{READ} & \text{verb} \\ \text{ARG-ST} \left\{ \begin{bmatrix} \text{ARG-NP}[str] \\ \text{READZED} + \text{+} \end{bmatrix} \right\} \oplus \box{\boxplus} \end{aligned} \end{aligned} (16)$$

$$\begin{aligned} \text{(16)} \quad & \begin{bmatrix} \text{ARG-ST}[smom] \end{bmatrix} \oplus \box{\boxplus} \end{aligned} $$

$$\begin{bmatrix} \text{ARG-ST} \left\| \begin{bmatrix} \text{ARG-ST}[smom] \end{bmatrix} \right\| \oplus \box{\boxplus} \end{aligned} \end{aligned}$$

Obviously, for such constraints to work, values of ARG-ST must be lists of slightly more complex objects than *synsem* (these are now values of ARG within such more complex objects), and additional principles must make sure that values of REALIZED are instantiated properly (see Przepiórkowski 1999a: 78–79 for details).

The analysis of Przepiórkowski (1996b; 1999a,b) assumes that an argument is locally realised – and hence may be assigned structural case – if and only if it is not raised to a higher argument structure. Meurers (1999a,b), on the basis of empirical observations in Haider (1990), Grewendorf (1994) and Müller (1997), shows that this assumption does not always hold in German; rather, structural case should be assigned to arguments on the basis of whether they are raised or not, and not whether they are locally realised or not. Consider the following data (Meurers 1999a: 294):

	- b. [*Einen* an.ACC *Außenseiter* outsider gewinnen] win.INF läßt lets Gott god hier here nie. never 'God never lets an outsider win here.'

Assuming that fronted fragments, marked with square brackets, are single constituents,<sup>6</sup> the subject of *gewinnen* 'win' forms a constituent with this verb, i.e., it has the same configurational realisation in both examples. Hence, configurational case assignment principles should assign it the same case in both instances, contrary to facts: *ein Außenseiter* occurs in the nominative in (17a) and *einen Außenseiter* bears the accusative in (17b). As argued by Meurers (1999a,b), the reason is that – although the subject is realised locally to its infinitival head – it is in some sense raised further to the subject position of the auxiliary *wird*

<sup>6</sup>This assumption is not completely uncontroversial; see Kiss (1994: 100–101) for apparent counterexamples and Müller (2003; 2005; 2021) for a defense of this assumption.

in (17a) and to the object position of the AcI verb *läßt* in (17b), hence the difference in cases. This suggests that structural case should be assigned not where the argument is realised, but on the highest ARG-ST on which it occurs. A corresponding modification of the non-configurational case assignment approach of Przepiórkowski (1996b; 1999a,b) – replacing the [REALIZED +] with [RAISED −] in constraints such as (15)–(16) and providing appropriate constraints on values of RAISED – is proposed in Przepiórkowski (1999a: 93–95); see also Müller (2013: Section 17.4) (and references therein) for further improvements.

While this non-configurational approach to syntactic case assignment was motivated largely by the need to capture complex interactions in a precise way, it turns out to formalise sometimes apparently contradictory intuitions expressed in various approaches to case. First of all, it preserves the common intuition that case is a local phenomenon, an intimate relation between a head and its dependents. Second, it successfully formalises the distinction between structural and inherent/lexical case known from the transformational literature of the 1980s, and non-configurationally encodes the apparently configurational principles of structural case assignment. Third, while most HPSG literature on case is concerned with syntactic phenomena in European languages, this approach has been extended to case stacking known, e.g., from languages of Australia and case attraction observed, e.g., in Classical Armenian and in Gothic (Malouf 2000). Fourth, by allowing antecedents of implicational constraints such as (15)–(16) to be *local* objects, not just syntactic *categories*, semantic factors influencing case assignment may also be taken into account, as in differential case marking, repeatedly considered in Lexical Functional Grammar (cf., e.g., Butt & King 2003 and references therein), but apparently not (so far) in HPSG. Fifth, as pointed out in Przepiórkowski (1999a,b), the above approach to case formalises the "case tier" intuition of Zaenen et al. (1985), Yip et al. (1987) and Maling (1993) (see also Maling 2009).

Let us illustrate the last point with some Finnish data from Maling (1993: 57, 59):

	- b. Lapsen child.GEN täytyy must lukea read kirja book.NOM kolmannen [third kerran. time].ACC 'The child must read the book for a third time.'
	- c. Kekkoseen Kekkonen.ILL luotettiin trust.PASSP yksi [one kerta. time].NOM 'Kekkonen was trusted once.'

d. Kekkoseen Kekkonen.ILL luotettiin trust.PASSP yhden [one kerran time].ACC yksi [one vuosi. year].NOM 'Kekkonen was trusted for one year once.'

Maling (1993) argues at length that some adjuncts (adverbials of measure, duration and frequency) behave just like objects with respect to case assignment and, in particular, notes the following generalisation about syntactic case assignment: only one NP dependent of the verb receives the nominative, namely the one which has the highest grammatical function; other dependents receive the accusative.<sup>7</sup> Thus, if none of the arguments bears inherent case, the subject is in the nominative and other dependents are in the accusative, cf. (18a), but if the subject bears an idiosyncratic case, it is the object that gets the nominative, cf. (18b). Furthermore, if all arguments (if any) bear inherent case, the next "available" grammatical function is that of an adjunct, thus one of the adjuncts receives the nominative, cf. (18c)–(18d).

Given such facts, Maling (1993) claims that syntactic case is assigned in Finnish on the basis of the grammatical function hierarchy and that at least some adjuncts belong to this hierarchy. Moreover, as evidenced by (18c)–(18d), adjuncts do not form a single class in this hierarchy: although the multiplicative adverbial *yksi kerta* is nominative in (18c), this case is won over by the duration adverbial in (18d). Taking into consideration also the partitive of negation facts (measure adverbials, but not duration or frequency adverbials, behave like direct objects in the sense that they take partitive case under sentential negation), Maling (1993) extends the grammatical function hierarchy for Finnish in the following way:

(19) SUBJ *>* OBJ *>* MEASURE *>* DURATION *>* FREQUENCY

While these generalisations are developed in the context of Lexical Functional Grammar, it is not clear how they could be encoded in LFG: there are no formal mechanisms for stating such a hierarchy of grammatical functions and, additionally, all adjuncts are assumed to be elements of an unordered set.<sup>8</sup> On the other hand, given the "adjuncts as complements" approach of Bouma et al. (2001) and others, upon which at least some adjuncts are added to ARG-ST (perhaps renamed to DEPS), and assuming – as is standard in HPSG – that ARG-ST elements satisfy the obliqueness hierarchy, formalisation of the "case tier" approach is easy and consists of two implicational constraints similar to (15)–(16). The first constraint resolves the first structurally-cased element of the extended ARG-ST to nominative, whether this element is the first element of ARG-ST or not (it is not in the

### 7 Case

<sup>7</sup>See also Zaenen & Maling (1983) and Zaenen et al. (1985) for a similar generalisation with respect to Icelandic.

<sup>8</sup>But see Przepiórkowski (2016) for an attempt to introduce a single ordered list of dependents and formalise the functional hierarchy in LFG.

case of (18b)–(18d)), and whether it corresponds to the subject, the direct object or an adjunct. The second constraint resolves the structural case of all subsequent elements, if any, to accusative.

# **3 Case syncretism and neutrality**

Another important strand of HPSG work on case concerns situations in which a single syncretic form seems to simultaneously bear two (or more) case values, as in the following examples involving coordination, free relatives and parasitic gaps:<sup>9</sup>

	- a. Kogo who.ACC/GEN Janek Janek.NOM lubi likes(OBJ.ACC) a and Jerzy Jerzy.NOM nienawidzi? hates(OBJ.GEN) 'Who does Janek like and Jerzy hate?'
	- b. \* Co what.NOM/ACC Janek Janek.NOM lubi likes(OBJ.ACC) a and Jerzy Jerzy.NOM nienawidzi? hates(OBJ.GEN) Intended: 'What does Janek like and Jerzy hate?'
	- a. Er he.NOM findet finds(OBJ.ACC) und and hilft helps(OBJ.DAT) Frauen. women.NOM/ACC/GEN/DAT 'He finds and helps women.'

<sup>9</sup>See also the respective chapters in this handbook. Abeillé & Chaves (2021: 744–745) deal with case syncretism in coordinated structures, Arnold & Godard (2021: Section 4.2.3) deal with free relatives and Borsley & Crysmann (2021: 551) deal with parasitic gaps.

7 Case


Was what.NOM/ACC du you.NOM mir me.DAT gegeben given(OBJ.ACC) hast, have ist is(SUBJ.NOM) prächtig. wonderful 'What you have given to me is wonderful.'

(24) English parasitic gaps (Hukari & Levine 1996: 482; Levine et al. 2001: 205):

Robin is someone who .NOM/ACC even good friends of .ACC believe .NOM should be closely watched.

In (20a), the fronted syncretic accusative/genitive form *kogo* 'who' satisfies the requirements of the two coordinated verbal constituents: in one, *lubi* 'likes' requires an accusative object, and in the other, *nienawidzi* 'hates' expects a genitive object. A form which is not syncretic between (at least) these two cases cannot occur in the place of *kogo*; this is illustrated in (20b), where the element putatively shared by the two verbal constituents is syncretic between accusative and nominative, rather than accusative and genitive. The English example (21) is similar and involves the relative pronoun *who*, syncretic between accusative and nominative. The well-known example (22) illustrates essentially the same phenomenon in German: the form *Frauen* 'women', which is fully syncretic with respect to case, simultaneously satisfies the accusative requirement of *findet* 'finds' and the dative requirement of *hilft* 'helps'. By contrast, this joint requirement is not satisfied either by *Männer*, which is accusative (among other cases) but not dative, or by *Männern*, which is dative but not accusative. The other two examples show that this phenomenon is not restricted to coordination. In (23), the syncretic form *was* 'what' simultaneously satisfies the constraint that the object of *gegeben* 'given' is accusative and that the subject of *ist* 'is' is nominative. Similarly, the extracted *who* in (24) seems to simultaneously bear the accusative case assigned by the preposition *of* and the nominative case of the subject of *should*.

Such examples were at one point considered problematic not only for HPSG, but for unification-based theories in general (Ingria 1990). The reason is that,

on the straightforward approach to case, they should all be ungrammatical. For example, in the case of (22a), the assignment of the accusative to the object of *findet* 'finds' should clash with the assignment of the dative to the object of *hilft* 'helps', as both objects are realised by the same noun *Frauen* 'women'. In other words, the attempt to unify accusative and dative should fail.

The solution first proposed by Levine et al. (2001: 207–208) is to enrich the *case* hierarchy in such a way that the unification of two different morphological cases does not necessarily result in failure.<sup>10</sup> Specifically, assuming that nominative and accusative are structural cases in English, they propose the part of the structural case hierarchy shown in Figure 2. 11

Figure 2: Case hierarchy for English encoding case syncretism

Particular nominal forms are specified in the lexicon as either pure accusative (*p-acc*), pure nominative (*p-nom*) or syncretic between the two (*p-nom-acc*):


On the other hand, heads – or constraints within a case principle of the kind presented in the previous section – specify particular arguments as *nom* or *acc*. So, in the case of the parasitic gap example (24), the *acc* requirement associated with the preposition *of* and the *nom* requirement on the subject of *should* are not incompatible: their unification results in *p-nom-acc* and the shared dependent may be any form compatible with this case value, e.g., *who* (but not *whom*). Examples (20)–(23) can be handled in a similar way.

<sup>10</sup>See Ingria (1990: 196) for an earlier implementation of roughly the same idea in the context of unification grammars.

<sup>11</sup>Type names follow the convention in Daniels (2002), for increased uniformity with the remainder of this section.

A situation often perceived as dual to such case neutrality, sometimes called "case underspecification", occurs when a head specifies the case of its dependent disjunctively and may combine with a coordinate structure containing phrases in both cases, e.g.:

(26) a. Polish (Przepiórkowski 1999a: 175):

Dajcie give(OBJ.ACC/GEN) wina wine.GEN i and całą whole.ACC świnię! pig.ACC 'Serve (some) wine and a whole pig!'

b. Russian (Levy 2001: 11):

Včera yesterday ves' all den' day on he proždal expected(OBJ.ACC/GEN) svoju self's.ACC podrugu girlfriend.ACC Irinu Irina.ACC i and zvonka call.GEN ot from svoego self's brata brother Grigorija. Grigory 'Yesterday he waited all day for his girlfriend Irina and for a call from his brother Grigory.'

In Polish, the object of the verb *dajcie* 'give' is normally in the accusative, but may also be realised as the genitive, when its meaning is partitive; in (26a), the object is a coordination of such a genitive noun *wina* '(some) wine' and the accusative *całą świnię* 'whole pig'. Similarly, according to Levy (2001), the Russian verb *proždal* 'awaited' may combine with accusative or genitive, and in (26b) it happily combines with a coordinate phrase containing both.

If such "accusative and genitive" coordinate phrases bear case at all, the value of this grammatical category must be something like *acc+gen*. Note that this situation differs from case neutrality discussed above: a syncretic case such as *p-acc-gen* intuitively corresponds to intersection: a nominal bearing this case is accusative and genitive at the same time. On the other hand, the intuition behind *acc+gen* is that of union: a (coordinated) nominal with this case value has accusative elements and genitive elements, so it may fill a position disjunctively specified as requiring accusative *or* genitive. However, *acc+gen* coordinate phrases cannot fill either purely accusative positions (because such phrases contain genitive – i.e., non-accusative – conjuncts), or purely genitive positions (because of accusative – i.e., non-genitive – conjuncts), or positions simultaneously specified as accusative *and* genitive, as in (20) above (for both reasons).

This duality is a feature of the Categorial Grammar approach to case and coordination of Bayer (1996) (see also Bayer & Johnson 1995) and the corresponding HPSG analyses were presented in Levy (2001) and Levy & Pollard (2002), as well

### 7 Case

as in Daniels (2002). As noted in Levy & Pollard (2002: 233), the two HPSG approaches are isomorphic. The main technical difference is that the relevant case hierarchies are construed outside of the usual HPSG type hierarchy in the approach of Levy (2001) and Levy & Pollard (2002), but they are fully integrated in the approach of Daniels (2002). For this reason, and also because it is the basis of some further HPSG work (e.g., Crysmann 2005), this latter approach is presented below.

Intuitively, just as the common subtype of *acc* and *nom*, i.e., *p-nom-acc* in Figure 2, represents forms which are simultaneously accusative and nominative, the common supertype, i.e., *case*, which should perhaps be renamed to *nom+acc*, should represent coordinate structures involving nominative and accusative conjuncts. However, given that all objects are assumed to be sort-resolved in standard HPSG (Richter 2021: 95, Chapter 3 of this volume), saying that the case of a coordinate structure is *case* (or *nom+acc*) is paramount to saying that it is either *p-acc* (pure accusative), or *p-nom-acc* (syncretic nominative/accusative), or *p-nom* (pure nominative). One solution is to "make a simple change to the framework's foundational assumptions" (Sag 2003: 268) and to allow linguistic objects to bear non-maximal types. This is proposed and illustrated in detail in Sag (2003). A more conservative solution, proposed in Daniels (2002), is to add dedicated maximal types to all such non-maximal types; for example, the hierarchy in Figure 2 is modified as shown in Figure 3. Apart from the trivial

Figure 3: Case (sub)hierarchy encoding nominative/accusative syncretism and underspecification

renaming of *case* to the more explicit *nom+acc*, a maximal type corresponding to this renamed non-maximal type is added here, namely, *p-nom+acc*.

Let us illustrate this approach with the two Polish examples (20a) and (26a), repeated below as (27a) and (27b):

(27) a. Kogo who.ACC/GEN Janek Janek.NOM lubi likes(OBJ.ACC) a and Jerzy Jerzy.NOM nienawidzi? hates(OBJ.GEN) 'Who does Janek like and Jerzy hate?'

b. Dajcie give wina wine.GEN i and całą whole.ACC świnię! pig.ACC 'Serve (some) wine and a whole pig!'

As these examples involve accusative and genitive, I will assume that the complete case hierarchy contains a subhierarchy such as that in Figure 3 above, but with all occurrences of *nom* replaced by *gen* as in Figure 4.

Figure 4: Case (sub)hierarchy encoding accusative/genitive syncretism and underspecification

First of all, heads subcategorise for (or relevant case principles specify) "nonpure" cases, i.e., *acc*, *gen*, *gen+acc*, etc., but not *p-acc*, *p-gen*, *p-gen+acc*, etc. For example, *lubi* 'likes' and *nienawidzi* 'hates' in (27a) expect their objects to have the case values *acc* and *gen*, respectively. Moreover, *dajcie* 'give' in (27b) specifies the case of its object as *gen+acc*. On the other hand, nominal dependents bear "pure" cases. For example, *kogo* 'who' in (27a) is lexically specified as *p-genacc*. Similarly to the analysis of the English parasitic gap example above, this neutralised case is compatible with both specifications: *acc* and *gen*.

The analysis of (27b) is a little more complicated, as a new principle is needed to determine the case of a coordinate structure. The two conjuncts, *wina* 'wine' and *całą świnię* 'whole pig', have – by virtue of lexical specifications of their head nouns – the case values *p-gen* and *p-acc*, respectively. Now, the case value of the coordination is determined as follows: take the "non-pure" versions of the cases of all conjuncts (here: *gen* and *acc*), find their (lowest) common supertype (here: *gen+acc*), and assign to the coordinate structure the "pure" type corresponding to this common supertype (here: *p-gen+acc*). This way the coordinate structure in (27b) ends up with the case value *p-gen+acc*, which is compatible with the *gen+acc* requirement posited by the verb *dajcie* (or by an appropriate principle of structural case assignment). Obviously, a purely accusative, purely genitive or accusative/genitive neutralised object would also satisfy this requirement.

One often-perceived – both within and outside of HPSG – problem with this approach is that it leads to very complex type hierarchies for *case* and rather inel-

### 7 Case

egant constraints (Sag 2003: 272, Dalrymple et al. 2009: 63–66). Let us, following Daniels (2002), simplify the presentation of type hierarchies such as that in Figure 3, by removing all those "pure" types which are only needed to represent some non-maximal types as maximal as in Figure 5. Hence, the representation

Figure 5: Simplified case (sub)hierarchy encoding nominative/accusative syncretism and underspecification

in this figure corresponds to seven types shown explicitly in Figure 3 (each nonmaximal type in Figure 5 has an additional *p-* type, while the maximal *nom-acc* in Figure 5 is the same as *p-nom-acc* in Figure 3). What would a similar hierarchy for three morphological cases look like? Daniels (2002: 143) provides the visualisation in Figure 6, involving 18 nodes corresponding to 35 types in the full type hierarchy. As mentioned in Levy & Pollard (2002: 225), the size of such a type

Figure 6: Simplified case (sub)hierarchy encoding accusative/dative/genitive syncretism and underspecification

### 7 Case

hierarchy grows double exponentially with the number of grammatical cases, so it would already be next to impossible to visualise such a hierarchy for German, with its four cases, not to mention Polish with its seven cases or Finno-Ugric languages with around 15 cases. And matters are further complicated by the fact that sometimes form syncretism simultaneously involves a number of grammatical categories, so perhaps such type hierarchies should combine case information with person, gender and number (Daniels 2002: 145, Crysmann 2005), and by the fact that coordinated elements may be specified for different categories (e.g., an NP specified for case may be coordinated with a sentence, see also Abeillé & Chaves 2021: Section 6, Chapter 16 of this volume), in which case it is not clear what categories should be borne by the coordinate structure as a whole (see, e.g., the inconclusive fn. 10 in Sag 2003: 277).

After the early 2000s, such complex *case* hierarchies do not appear in HPSG work. A possible reason for this is the increasing popularity of ellipsis-based accounts of various coordinate constructions, including unlike category coordination cases, of which the "case underspecification" examples (26) may be seen as special cases.<sup>12</sup> Such ellipsis accounts are usually formulated within the linearisation approach of Reape (1992; 1994) and Kathol (1995), and they have been claimed to deal with some of the cases discussed in this section, e.g., by Crysmann (2008), Beavers & Sag (2004), and Chaves (2006; 2008). However, such linearisation-based approaches to coordination have more recently come under attack: see Levine (2011) and Kubota & Levine (2015) (see also Yatabe 2012; 2016 and, especially, Yatabe & Tam 2021 for a defence of ellipsis-based accounts of some cases of coordination).<sup>13</sup> Hence, it is difficult to predict at the moment whether ellipsis-based analyses will permanently remove the need for complex type hierarchies modelling neutralisation and underspecification in coordination. But even if they do, some of the examples given at the beginning of this section, namely (23)–(24), demonstrate that feature neutrality is not limited to coordinate structures, but also occurs at least in free relatives and multiple gapping, so case hierarchies of the kind illustrated in Figure 2, with separate types representing syncretic cases, are still needed in contemporary HPSG, regardless of the analysis of coordination; an example of a more recent analysis which does assume such

<sup>12</sup>Another HPSG approach to unlike category coordination which obviates the need for such complex hierarchies is that of Yatabe (2004), according to which the – perhaps disjunctive or underspecified – requirements of the head independently distribute to all conjuncts, in a manner similar to (but more general than) distributivity within coordinate structures assumed in LFG (Dalrymple & Kaplan 2000; Dalrymple et al. 2009; Przepiórkowski & Patejuk 2012).

<sup>13</sup>See also the chapters by Nykiel & Kim (2021) and Abeillé & Chaves (2021) for discussions of HPSG analyses of ellipsis and coordination, respectively.

a case hierarchy (to account for gapping and resumptive pronouns in Modern Standard Arabic) is Alotaibi & Borsley (2013). 14

# **4 Other HPSG work on case**

Apart from the two clearly identifiable strands of HPSG work described in the two preceding sections, there are also single papers concerned with various theoretical and implementational aspects of grammatical case. Of these, the report by Drellishak (2008) on modelling complex case phenomena in the Grammar Matrix (Bender et al. 2002) has the widest typological scope. It describes the treatment of various case systems in the multilingual platform for implementing HPSG grammars: not only the pure nominative-accusative, ergative-absolutive and tripartite systems, but also systems with various types of split ergativity, systems – known from Austronesian languages, including Tagalog – in which case marking interacts with focus marking, and so-called "direct-inverse" systems, exemplified by Algonquian languages, in which case marking partially depends on the hierarchies – or scales – of nominal phrases, e.g., based on person and/or animacy. Similarly to the non-configurational case assignment principles discussed in Section 2 above, such systems are described – via constraints on specific lexical types – by specifying case values of elements on ARG-ST. Also, a typologically very interesting language, Nias, usually assumed to display the ergativeabsolutive alignment but with the typologically exceptional property of marking the absolutive – rather than the ergative – case, is reanalysed as a nominativeaccusative language in Crysmann (2009), with the sole argument of intransitive verbs mapped to the grammatical function of object, rather than subject.

Two other works mentioned here are concerned with two very different aspects of case systems of particular languages. Ryu (2013) investigates the issue of case spreading from an argument of a verb to certain nominal dependents of this argument in Korean. He investigates the semantic relations that must hold between the two nominals for such "case copying" to occur and proposes a repertoire of 16 semantic relations (collected in five coherent groups, further classified into two general classes) which make the spreading of the nominative possible, 10 of which (three of the five groups, one of the two classes) license the spreading of the accusative. On the syntactic side, the dependents of such nominal arguments are raised to become valency elements of the governing verbs. In particular, dependents of the subject are raised to the valence list for subjects SUBJ, resulting

<sup>14</sup>But see Crysmann (2017) for a reanalysis which does not need to refer to such a case hierarchy.

### 7 Case

in multiple elements within the SUBJ list of a single verb. Configurational case assignment rules constrain the value of case of each valency subject to nominative, and of each valency complement to accusative. The paper does not discuss the (im)possibility of formulating such case assignment rules non-configurationally, within local ARG-ST (or DEPS), but the challenge for the non-configurational case assignment seems to be the fact that multiple argument structure elements may correspond to valency subjects (and multiple to valency complements), so – looking at the argument structure alone – it is not immediately clear how many initial elements of this list should be assigned the nominative case, and which final elements should get the accusative.

Finally, a very different aspect of Hungarian case is investigated in Thuilier (2011), namely, whether case affixes should be distinguished from postpositions and, if so, where to draw the line. In Hungarian, postpositions behave in some respect just like case affixes (e.g., they do not allow any intervening material between them and the nominal phrase), which has led some researches to deny the existence of the affix/postposition distinction. Thuilier (2011) shows that, in this case, the traditional received wisdom is right, and that case affixes and postpositions differ in a number of morphological and syntactic ways. The proposed tests suggest that the essive element *ként*, normally considered to be a case affix, should be reanalysed as a postposition, thus establishing the number of Hungarian cases as 16. The resulting analysis of Hungarian case affixes and postpositions is couched within Sign-Based Construction Grammar (Boas & Sag 2012).

In summary, while HPSG is perhaps not best known for its approach to grammatical case, it does offer a range of interesting accounts of a variety of caserelated phenomena in diverse languages ranging from German, Icelandic and Polish through Finnish and Hungarian to Korean and Nias; it provides perhaps the only formal implementation of the influential "case tier" idea; and it successfully captures somewhat conflicting intuitions concerning the locality of case assignment.

# **Acknowledgements**

I would like to thank the following colleagues for their comments on a previous version of this chapter: Rui Chaves, Tony Davis, Jean-Pierre Koenig, Detmar Meurers, Stefan Müller and Shûichi Yatabe. I wish I could blame them for any remaining errors and omissions.

# **References**


*ceedings of the 12th International Conference on Head-Driven Phrase Structure Grammar, Department of Informatics, University of Lisbon*, 91–107. Stanford, CA: CSLI Publications. http://csli- publications.stanford.edu/HPSG/2005/ crysmann.pdf (10 February, 2021).



7 Case

(Oxford Handbooks in Linguistics), 72–87. Oxford: Oxford University Press. DOI: 10.1093/oxfordhb/9780199206476.013.0006.


Müller, Stefan. 2003. Mehrfache Vorfeldbesetzung. *Deutsche Sprache* 31(1). 29–62.



*National University Daejeon*, 453–473. Stanford, CA: CSLI Publications. http: //csli-publications.stanford.edu/HPSG/2012/yatabe.pdf (10 February, 2021).


# **Chapter 8**

# **Nominal structures**

# Frank Van Eynde

University of Leuven

This chapter shows how nominal structures are treated in HPSG. The introduction puts the discussion in the broader context of the NP vs. DP debate and differentiates three HPSG treatments: the specifier treatment, the DP treatment and the functor treatment. They are each presented in some detail and applied to the analysis of ordinary nominals. A comparison reveals that the DP treatment does not mesh as well with the monostratal surface-oriented nature of the HPSG framework as the other treatments. Then it is shown how the specifier treatment and the functor treatment deal with nominals that have idiosyncratic properties, such as the gerund, the Big Mess Construction and irregular P+NOM combinations.

# **1 Introduction**

I use the term *nominal* in a broad and non-technical sense as standing for a noun and its phrasal projection. All of the bracketed strings in (1) are, hence, nominals.

(1) [the [red [box]]] has disappeared

The analysis of nominals continues to be a matter of debate. Advocates of the NP approach treat the noun as the head of the nominal, not only in *red box* but also in *the red box*. Advocates of the DP approach, by contrast, make a distinction between the nominal core, consisting of a noun with its complements and modifiers, if any, and a functional outer layer, comprising determiners, quantifiers and numerals. They, hence, treat the noun as the head of *red box* and the determiner as the head of *the red box*, so that the category of *the red box* is DP.

The NP approach remained unchallenged throughout the first decades of generative grammar. The Government and Binding model (Chomsky 1981), for instance, employed the phrase structure rule in (2).

Frank Van Eynde. 2021. Nominal structures. In Stefan Müller, Anne Abeillé, Robert D. Borsley & Jean- Pierre Koenig (eds.), *Head-Driven Phrase Structure Grammar: The handbook*, 275–313. Berlin: Language Science Press. DOI: 10. 5281/zenodo.5599832

#### Frank Van Eynde

(2) NP → Det Nom

Phrase structure rules were required to "meet some variety of X-bar theory" (Chomsky 1981: 5). The original variety is that of Chomsky (1970). It consists of the following cross-categorial rule schemata:

$$\begin{array}{rcl} \text{(3)} & \text{a. } \mathcal{X}' & \to & \mathcal{X} & \dots \\ & \text{b. } \mathcal{X}'' & \to & [\text{Spec}, \mathcal{X}'] & \mathcal{X}' \end{array}$$

X 0 stands for the combination of a head and its complements, where X is N, A or V, and X<sup>00</sup> stands for the combination of X<sup>0</sup> and its specifier "where [Spec,N<sup>0</sup> ] will be analyzed as the determiner" (Chomsky 1970: 210). X-bar theory was further developed in Jackendoff (1977), who added a schema for the addition of adjuncts and who extended the range of X with P, the category of adpositions. A monostratal version of X-bar theory is developed in Generalized Phrase Structure Grammar (GPSG). Its application to nominals is exemplified in Figure 1, quoted from Gazdar et al. (1985: 126). The top node is the double-bar category N00, which consists of the determiner and the single-bar category N<sup>0</sup> . The AP and the relative clause (S[+R]) are adjoined to N<sup>0</sup> , and the lowest N<sup>0</sup> consists of the noun and its PP complement.

that very tall sister of Leslie who we met

Figure 1: An instance of the NP approach

The DP approach results from an extension of the range of X in (3) to the functional categories. This was motivated by the fact that some of the phrase structure rules, such as (4), do not fit the X-bar mould.

8 Nominal structures

### (4) S → NP Aux VP

To repair this, the category Aux, which contained both auxiliaries and inflectional verbal affixes (Chomsky 1957), was renamed as I(nfl) and treated as the head of S. More specifically, I(nfl) was claimed to combine with a VP complement, yielding I<sup>0</sup> , and I<sup>0</sup> was claimed to combine with an NP specifier (the subject), yielding I<sup>00</sup> (formerly S). For the analysis of nominals such an overhaul did not at first seem necessary, since the relevant PS rules did fit the X-bar mould, but it took place nonetheless, mainly in order to capture similarities between nominal and clausal structures. These are especially conspicuous in gerunds, nominalized infinitives and nominals with a deverbal head, and were seen as evidence for the claim that determiners have their own phrasal projection, just like the members of I(nfl) (Abney 1987). More specifically, members of D were claimed to take an N<sup>00</sup> complement, yielding D<sup>0</sup> , and D<sup>0</sup> was claimed to have a potentially empty specifier sister, as in Figure 2. The DP approach was also taken on board in other frameworks, such as Word Grammar (Hudson 1990) and Lexical Functional Grammar (Bresnan 2001: 99).

Figure 2: An instance of the DP approach

Turning now to Head-Driven Phrase Structure Grammar, we find three different treatments. The first and oldest can be characterized as a lexicalist version of the NP approach, more specifically of its monostratal formulation in GPSG, see Pollard & Sag (1987: Sections 4.4 and 5.7), Pollard & Sag (1994: Sections 1.7 to 1.9 and 9.4), Ginzburg & Sag (2000: 189–195) and Machicao y Priemer & Müller

#### Frank Van Eynde

(2021). I henceforth call it the *specifier treatment*, after the role which it assigns to the determiner. The second is a lexicalist version of the DP approach. It is first proposed in Netter (1994) and further developed in Netter (1996) and Nerbonne & Mullen (2000). I will call it the *DP treatment*. The third adopts the NP approach, but neutralizes the distinction between adjuncts and specifiers, treating them both as functors. It is first proposed in Van Eynde (1998) and Allegranza (1998) and further developed in Van Eynde (2003), Van Eynde (2006) and Allegranza (2007). It is also adopted in Sign-Based Construction Grammar (Sag 2012: Section 8.4.<sup>1</sup> I will call it the *functor treatment*. This chapter presents the three treatments and compares them wherever this seems appropriate.

I first focus on ordinary nominals (Section 2) and then on nominals with idiosyncratic properties (Section 3). For exemplification I use English and a number of other Germanic and Romance languages, including Dutch, German, Italian and French. I assume familiarity with the typed feature description notation and with such basic notions as inheritance and token-identity; see Richter (2021), Chapter 3 of this volume and Abeillé & Borsley (2021), Chapter 1 of this volume.<sup>2</sup>

# **2 Ordinary nominals**

I use the term *ordinary nominal* for a nominal that contains a noun, any number of complements and/or adjuncts and at most one determiner. This section shows how such nominals are analyzed in the specifier treatment (Section 2.1), the DP treatment (Section 2.2) and the functor treatment (Section 2.3).

# **2.1 The specifier treatment**

The specifier treatment adopts the same distinction between heads, complements, specifiers and adjuncts as X-bar theory, but its integration in a monostratal lexicalist framework inevitably leads to non-trivial differences, as will be demonstrated in this section. The presentation is mainly based on Pollard & Sag (1994) and Ginzburg & Sag (2000). I first discuss the syntactic structure (Section 2.1.1) and the semantic composition (Section 2.1.2) of nominals, and then turn to nominals with a phrasal specifier (Section 2.1.3).

<sup>1</sup>On SBCG in general, see Müller 2021b: Section 1.3.2, Chapter 32 of this volume.

<sup>2</sup>This chapter does not treat relative clauses, since they are the topic of a separate chapter (Arnold & Godard 2021, Chapter 14 of this volume).

8 Nominal structures

### **2.1.1 Syntactic structure**

Continuing with the same example as in Figure 2, a relational noun, such as*sister*, selects a PP as its complement and a determiner as its specifier, as spelled out in the following CATEGORY value:

```
(5)

      category
      HEAD noun
      SPR 

             DET
      COMPS 

             PP[of]
```
The combination with a matching PP is subsumed by the *head-complementsphrase* type, and yields a nominal with an empty COMPS list (see also Abeillé & Borsley 2021: Section 5.1, Chapter 1 of this volume on structures of type *headcomplements-phrase*). Similarly, the combination of this nominal with a matching determiner is subsumed by the *head-specifier-phrase* type, and yields a nominal with an empty SPR list, as spelled out in Figure 3.

Figure 3: Adnominal complements and specifiers

Since the noun is the head of *sister of Leslie* and since *sister of Leslie* is the head of *that sister of Leslie*, the Head Feature Principle<sup>3</sup> implies that the phrase as a whole shares the HEAD value of the noun ( 1 ). The valence features, COMPS and SPR, have a double role. On the one hand, they register the degree of saturation of the nominal; in this role they supersede the bar levels of X-bar theory. On the other hand, they capture co-occurrence restrictions, such as the fact that the complement of *sister* is a PP, rather than an NP or a clause.

In contrast to complements and specifiers, adjuncts are not selected by their head sister. Instead, they are treated as selectors of their head sisters. To model

<sup>3</sup>See Pollard & Sag (1994: 34) and Abeillé & Borsley (2021: Section 5.1), Chapter 1 of this volume.

#### Frank Van Eynde

this, Pollard & Sag (1994: 55–57) employ the feature MOD(IFIED). It is part of the HEAD value of the substantive parts-of-speech, i.e. noun, verb, adjective and adposition. Its value is of type *synsem* in the case of adjuncts and of type *none* otherwise.

### (6) *substantive*: - MOD *synsem* ∨ *none*

Attributive adjectives, for instance, select a nominal head sister which requires a specifier, as spelled out in (7).

The token-identity of the MOD(IFIED) value of the adjective with the SYNSEM value of its head sister is part of the definition of the type *head-adjunct-phrase* (Abeillé & Borsley 2021: Section 5.1). The requirement that the SPR value of the selected nominal be a non-empty list blocks the addition of adjectives to nominals which contain a determiner, as in \**tall that bridge*. <sup>4</sup> Since the MOD(IFIED) feature is part of the HEAD value, it follows from the Head Feature Principle that it is shared between an adjective and the AP which it projects. As a consequence, the MOD(IFIED) value of *very tall* is shared with that of *tall*, as shown in Figure 4.

Figure 4: Adnominal modifiers

<sup>4</sup>This constraint is overruled in the Big Mess Construction, see Section 3.3.

8 Nominal structures

For languages in which attributive adjectives show number and gender agreement with the nouns they modify, the selected nominal is required to have specific number and gender values. The Italian *grossa* 'big', for instance, selects a singular feminine nominal and is, hence, compatible with a noun like *scatola* 'box', but not with the plural *scatole* 'boxes' nor with the masculine *libro* 'book' or *libri* 'books'.<sup>5</sup>

### **2.1.2 Semantic composition**

Semantic representations in HPSG do not constitute a separate level of representation, but take the form of attribute value pairs that are added to the syntactic representations. Phrase formation and semantic composition are, hence, modeled in tandem. Technically, the CONTENT feature is declared for the same type of objects as the CATEGORY feature, as spelled out in (8).

(8) *local*: CATEGORY *category* CONTENT *semantic-object*

In the case of nominals, the value of the CONTENT feature is of type *scope-object*, a subtype of *semantic-object* (Ginzburg & Sag 2000: 122). A scope-object is an index-restriction pair in which the index stands for entities and the restriction is a set of facts which constrain the denotation of the index, as in the CONTENT value of the noun *box*:

$$\text{(9)} \quad \begin{bmatrix} \text{score-object} \\ \text{INDex} \begin{bmatrix} \text{I} \end{bmatrix} index \\ \text{RESTR} \left\{ \begin{bmatrix} box \\ \text{ARG} \begin{bmatrix} \text{I} \end{bmatrix} \end{bmatrix} \right\} \end{bmatrix}$$

This is comparable to the representations which are canonically used in Predicate Logic (PL), such as { x | *box*(x) }, where x stands for the entities that the predicate *box* applies to. In contrast to PL variables, HPSG indices are sorted with respect to person, number and gender. This provides the means to model the type of agreement that is called *index agreement* (Wechsler 2021: Section 4.2, Chapter 6 of this volume).

(10) *index*: PERSON *person* NUMBER *number* GENDER *gender* 

<sup>5</sup>This is an instance of concord (Wechsler 2021: Section 4.2, Chapter 6 of this volume).

#### Frank Van Eynde

CONTENT values of attributive adjectives are also of type *scope-object*. When combined with a noun, as in *red box*, the resulting representation is one in which the indices of the adjective and the noun are identical, as in (11).<sup>6</sup>

$$\begin{array}{c} \begin{bmatrix} \text{scope-object} \\ \text{INDEX} \begin{bmatrix} \overleftarrow{\text{l}} \end{bmatrix} \\ \begin{array}{c} \text{RESTR} \left\{ \begin{bmatrix} red \\ \text{ARG} \left\{ \overleftarrow{\text{l}} \right\}, \begin{bmatrix} box \\ \text{ARG} \left\{ \overleftarrow{\text{l}} \right\} \end{bmatrix} \right\} \end{array} \end{array} \end{array}$$

Also this is comparable to the PL practice of representing such combinations with one variable to which both predicates apply, as in { x | *red*(x) & *box*(x) }. What triggers the index sharing is the MOD(IFIED) value of the adjective, as illustrated by the AVM of *red* in (12) (Pollard & Sag 1994: 55).

$$\begin{array}{c} \text{(12)} \begin{bmatrix} \text{acTEGORY}|\text{HEAD} \\ \text{[CATEGORY]}|\text{HEAD} \\ \text{MOD}|\text{LOCZINT} \\ \text{CONEX} \begin{bmatrix} \text{INDEX} \\ \text{REST} \end{bmatrix} \end{bmatrix} \begin{bmatrix} \text{score-object} \\ \text{INEX} \begin{bmatrix} \text{\Box} \\ \text{REST} \end{bmatrix} \end{bmatrix} \end{array}$$

The adjective selects a *synsem* whose CONTENT value is a scope-object, shares its index and adds its restriction to those of the *synsem* it selects. The resulting CONTENT value is then shared with the mother.

To model the semantic contribution of determiners, Ginzburg & Sag (2000: 135–136) make a distinction between scope-objects that contain a quantifier (*quant-rel*), and those that do not (*parameter*). The addition of a quantifying determiner to a nominal, as in *every red box*, triggers a shift from *parameter* to *quant-rel*. To capture this shift, the specifier treatment employs the feature SPEC(IFIED). It is part of the HEAD value of determiners, and its value is of type *semantic-object* (Ginzburg & Sag 2000: 362).<sup>7</sup>

(13) *determiner*: - SPEC *semantic-object*

In the case of *every*, the SPEC value is an object of type *parameter*, but its own CONTENT value is a subtype of *quant-rel* and this quantifier is put in store, to be

<sup>6</sup>This is an example of intersective modification. The semantic contribution of other types of adjectives, such as *alleged* and *fake*, are modeled differently (Pollard & Sag 1994: 330–331). See also Koenig & Richter (2021: Section 3.2), Chapter 22 of this volume.

<sup>7</sup> In Pollard & Sag (1994: 45) the SPEC(IFIED) feature was also assigned to other function words, such as complementizers, and its value was of type *synsem*.

#### 8 Nominal structures

retrieved at the place where its scope is determined, as illustrated by the AVM of *every* in (14) (Ginzburg & Sag 2000: 204).

Notice that the addition of the SPEC feature yields an analysis in which the determiner and the nominal select each other: the nominal selects its specifier by means of the valence feature SPR and the determiner selects the semantic content of the nominal by means of SPEC.

### **2.1.3 Nominals with a phrasal specifier**

Specifiers of nominals tend to be single words, but they can also take the form of a phrase. The bracketed phrase in [*the Queen of England's*] *sister*, for instance, is in complementary distribution with the possessive determiner in *her sister* and has a comparable semantic contribution. For this reason it is treated along the same lines. More specifically, the possessive marker*'s* is treated as a determiner that takes an NP as its specifier, as shown in Figure 5 (Pollard & Sag 1994: 51–54) and (Ginzburg & Sag 2000: 193).<sup>8</sup>

In this analysis, the specifier of *sister* is a DetP that is headed by *'s*, which in turn takes the NP *the Queen of England* as its specifier.<sup>9</sup> Semantically, *'s* relates the index of its specifier (the possessor) to the index of the nominal that it selects (the possessed), as spelled out in (15).<sup>10</sup>

<sup>8</sup>The treatment of the phonologically reduced *'s* as the head of a phrase is comparable to the treatment of the homophonous word in *he's ill* as the head of a VP. Notice that the possessive *'s* is not a genitive affix, for if it were, it would be affixed to the head noun *Queen*, as in \**the Queen's of England sister* (see Sag, Wasow & Bender 2003: 199).

<sup>9</sup>Since the specifier of *'s* is an NP, it may in turn contain a specifier that is headed by *'s*, as in *John's uncle's car*.

<sup>10</sup>The terms *possessor* and *possessed* are meant to be understood in a broad, not-too-literal sense (Nerbonne 1992: 8–9).

Frank Van Eynde

Figure 5: A phrasal specifier

The assignment of *the-rel* as the CONTENT value captures the definiteness of the resulting NP. Notice that this analysis contains a DetP, but in spite of that, it is not an instance of the DP approach, since the determiner does not head the nominal as a whole, but only its specifier.

# **2.2 The DP treatment**

An HPSG version of the DP approach has been developed in Netter (1994) and Netter (1996). I sketch the main characteristics of this treatment in Section 2.2.1 and discuss some problems for it in Section 2.2.2.

# **2.2.1 Functional complementation and functional completeness**

The combination of a noun with its complements and its adjuncts is analyzed in much the same way as in the specifier treatment. The addition of the determiner,

#### 8 Nominal structures

though, is modeled differently. It is not the nominal that selects the determiner as its specifier, but rather the determiner that selects the nominal as its complement. More specifically, it selects the nominal by means of the valence feature COMPS and the result of the combination is a DP with an empty COMPS list, as in Figure 6.

Figure 6: Propagation of the HEAD and COMPS values

In this analysis there is no need for the valence feature SPR in the NP. This looks like a gain, but in practice it is offset by the introduction of a distinction between functional complementation and ordinary complementation. To model it, Netter (1994: 307–308) differentiates between major and minor HEAD features:

$$\begin{aligned} \text{(16)} \quad \begin{bmatrix} \text{ } \\ \text{^{\text{H}}\text{A}\text{J}\text{OR}} \left[ \begin{smallmatrix} \text{N} \; ^{\text{I}} \\ \text{v} \; ^{\text{I}} \text{o} \; ^{\text{I}} \text{an} \end{smallmatrix} \right] \\ \begin{bmatrix} \text{^{\text{N}}\text{C}\text{O}\text{o} \; ^{\text{I}} \text{an} } \end{bmatrix} \end{aligned} $$

The MAJOR attribute includes the boolean features N and V, where nouns are [+N, –V], adjectives [+N, +V], verbs [–N, +V] and adpositions [–N, –V]. Besides, [+N] categories also have the features CASE, NUMBER and GENDER. Typical of functional complementation is that the functional head shares the MAJOR value of its complement, as specified in (17).

(17) Functional Complementation: In a lexical category of type *func-cat* the value of its MAJOR attribute is token identical with the MAJOR value of its complement (Netter 1994: 311–312).

Since determiners are of type *func-cat*, they share the MAJOR value of their nominal complement, and since that value is also shared with the DP (given the Head Feature Principle), it follows that the resulting DP is [+N,–V] and that its CASE,

#### Frank Van Eynde

NUMBER and GENDER values are identical to those of its nominal non-head daughter. Nouns, by contrast, are not of type *func-cat* and, hence, do not share the MAJOR value of their complement. The noun *sister* in Figure 6, for instance, does not share the part-of-speech of its PP complement.

The MINOR attribute is used to model properties which a functional head does *not* share with its complement. It includes FCOMPL, a feature which registers whether a projection is functionally complete or not. Its value is positive for determiners, negative for singular count nouns and underspecified for plurals and mass nouns. Determiners take a nominal complement with a negative FCOMPL value, but their own FCOMPL value is positive, and since they are the head, they share this value with the mother, as in Figure 7.

Figure 7: Propagation of the MAJOR and MINOR values

In this analysis, a nominal is complete if it is both saturated (empty COMPS list) and functionally complete (positive FCOMPL), as spelled out in (18) (Netter 1994: 312).

(18) Functional Completeness Constraint: Every maximal projection is marked as functionally complete in its MINOR feature.

### **2.2.2 Two problems for the DP treatment**

Given the definition of functional complementation in (17), determiners share the MAJOR value of the nominals which they select and are, hence, nominal themselves, i.e. [+N, –V]. However, while this makes sense for determiners with (pro)nominal properties, such as the English demonstrative *that*, it is rather implausible for determiners with adjectival properties, such as the German interrogative *welch-* 'which' and the Italian demonstrative *questo* 'this', which show the same variation for number, gender and case as the adjectives and which are subject to the same requirement on concord with the noun as adnominal adjectives. Since such determiners have more in common with adjectives than with (pro)nouns, it would be more plausible to treat them as members of [+N, +V].

#### 8 Nominal structures

The problem also affects the associated agreement features, i.e. CASE, NUMBER and GENDER. If a determiner is required to share the values of these features with its nominal complement, as spelled out in (17), then one gets implausible results for nominals in which the determiner and the noun do not show agreement. In the Dutch *'s lands hoogste bergen* 'the country's highest mountains', for instance, the selected nominal (*hoogste bergen*) is plural and non-genitive, while the selecting determiner (*'s lands*) is singular and genitive. The assumption that the determiner shares the case and number of its nominal sister is, hence, problematic.

Another problem concerns the assumption "that all substantive categories will require the complement they combine with to be both saturated and functionally complete" (Netter 1994: 311). Complements of verbs and adpositions must, hence, be positively specified for FCOMPL. This is contradicted by the existence of adpositions which require their complement to be functionally incomplete. The Dutch *te* and *per*, for instance, require a determinerless nominal, even if the nominal is singular and count, as in *te (\*het) paard* 'on horse' and *per (\*de) trein* 'by train'. A reviewer points out that this is not necessarily a problem for the DP approach, but only for Netter's version of it. Technically, it may indeed suffice to drop the erroneous assumption, but conceptually the existence of adpositions which require a determinerless nominal does suggest that the NP approach is more plausible, especially since there are no adpositions (nor verbs) which require a nounless nominal.<sup>11</sup>

# **2.3 The functor treatment**

The functor treatment adopts the NP approach, but in contrast to the specifier treatment, it does not model specification and adjunction in different terms, and it does not adopt the distinction between substantive (or lexical) categories and functional categories.<sup>12</sup> The presentation in this section is mainly based on Van Eynde (2006) and Allegranza (2007). I first discuss the motivation which underlies the adoption of the functor treatment (Section 2.3.1) and then present its basic properties (Section 2.3.2). I then turn to nominals with a phrasal specifier (Section 2.3.3) and to the hierarchy of MARKING values (Section 2.3.4).

<sup>11</sup>NP treatments of adpositions with a determinerless nominal are provided in Van Eynde (2004) for Dutch and in Kiss (2008) for German.

<sup>12</sup>The term *functor* is also used in Categorial (Unification) Grammar, where it has a very broad meaning, subsuming the non-head daughter in combinations of a head with a specifier or an adjunct, and the head daughter otherwise; see Bouma (1988). This broad notion is also adopted in Reape (1994). I adopt a more restrictive version in which functors are non-head daughters which lexically select their head sister. For a general comparison of HPSG and Categorial Grammar see also Kubota (2021), Chapter 29 of this volume.

#### Frank Van Eynde

### **2.3.1 Motivation**

The distinction between specifiers and adjuncts is usually motivated by the assumption that the former are obligatory and non-stackable, while the latter are optional and stackable. In practice, though, this distinction is blurred by the fact that many nominals are well-formed without a specifier. Bare plurals and singular mass nouns, for instance, are routinely used without a specifier in English, and many other languages allow singular count nouns without a specifier too. The claim that specifiers are obligatory is, hence, to be taken with a large pinch of salt. The same holds for their non-stackability. Italian possessives, for instance, are routinely preceded by an article, as in *il nostro futuro* 'the our future' and *un mio amico* 'a friend of mine'. This is also true for the Greek demonstratives, which are canonically preceded by the definite article. English, too, has examples of this kind, as in *his every wish*.

Similar remarks apply to the distinction between lexical and functional categories. This distinction plays a prominent role in the specifier and the DP treatment, both of which treat determiners as members of a separate functional category Det, that is distinct from such lexical categories as N, Adj and Adv. In practice, though, it turns out that the class of determiners is quite heterogeneous in terms of part-of speech. Van Eynde (2006), for instance, demonstrates that the Dutch determiners come in (at least) two kinds. On the one hand, there are those which show the same inflectional variation and the same concord with the noun as prenominal adjectives: they take the affix *-e* in combination with plural and singular non-neuter nominals, but not in combination with singular neuter nominals, as shown for the adjective *zwart* 'black' in (19), for the possessive determiner *ons* 'our' in (20) and for the interrogative determiner *welk* 'which' in (21).<sup>13</sup>

(19) a. zwarte black muren wall.PL (Dutch)


<sup>13</sup>If the adjective is preceded by a definite determiner, it also takes the affix in singular neuter nominals. This phenomenon is treated in Section 2.3.4.

8 Nominal structures

	- b. welke which man man.SG.M
	- c. welk which boek book.SG.N

On the other hand, there are determiners which are inflectionally invariant and which do do not show concord with the noun, such as the interrogative *wiens* 'whose' and the quantifier *wat* 'some'.

(22) a. wiens whose ouders parent.PL

(Dutch)

	- b. wat some verf paint.SG.F c. wat zand
	- some sand.SG.N

In that respect, they are like nouns that appear in prenominal position, as in *aluminium tafels* 'aluminum tables' and *de maximum lengte* 'the maximum length'. There are, hence, determiners with adjectival properties and determiners with nominal properties. The distinction is also relevant for other languages. The Italian possessives of the first and second person, for instance, show the same alternation for number and gender as adjectives and are subject to the same constraints on NP-internal concord, as illustrated for *nostro* 'our' in (24).

(24) a. il the nostro our futuro future.SG.M (Italian)

Frank Van Eynde


By contrast, the possessive of the third person plural, *loro* 'their', does not show any inflectional variation and does not show concord with the noun.

(Italian)

	- b. la the loro their scuola school.SG.F
	- c. i the loro their genitori parent.PL.M
	- d. le the loro their scatole box.PL.F

Confirming evidence for the distinction between adjectival and pronominal possessives is provided by the fact that *loro* is also used as a personal pronoun, whereas the other possessives are not.<sup>14</sup>

(26) Enrico Enrico ha has dato given una a scatola box a to loro them / \* nostro. our (Italian) 'Enrico gave them a box.'

There are also determiners with adverbial properties. Abeillé, Bonami, Godard & Tseng (2004), for instance, assign adverbial status to the quantifying determiner in the French *beaucoup de farine* 'much flour', and the same could be argued for such determiners as the English *enough* and its Dutch equivalent *genoeg*. In sum, there is evidence that the class of determiners is categorially heterogeneous and that a treatment which acknowledges this is potentially simpler and less stipulative than one which introduces a separate functional category for them.

### **2.3.2 Basics**

Technically, the elimination of the distinction between specifiers and adjuncts means that the SPR feature is dropped. Likewise, the elimination of the distinction

<sup>14</sup>In this context one has to use the pronoun *noi* 'us' instead.

#### 8 Nominal structures

between lexical and functional categories means that there is no longer any need for separate selection features for them; MOD(IFIED) and SPEC(IFIED) are dropped and replaced by the more general SELECT. To spell out the functor treatment in more detail, I start from the hierarchy of headed phrases in Figure 8.

Figure 8: Hierarchy of headed phrases

The basic distinction is the one between *head-argument-phrase* and *head-nonargument-phrase*. In head-argument phrases, the head daughter selects its nonhead sister(s) by means of valence features, such as COMPS and SUBJ (but not SPR!), and it is their values that register the degree of saturation of the phrase, as shown for COMPS in Section 2.1.1. In head-nonargument phrases, the degree of saturation is registered by the MARKING feature. It is declared for objects of type *category*, along with the HEAD and valence features.<sup>15</sup> Its value is shared with the head daughter in head-argument phrases and with the non-head daughter in head-nonargument phrases, as spelled out in (27) and (28) respectively.


At a finer-grained level, there is a distinction between two subtypes of *headnonargument-phrase*. There is the type, called *head-functor-phrase*, in which the non-head daughter selects its head sister. This selection is modeled by the SELECT feature. Its value is an object of type *synsem* and is required to match the SYNSEM value of the head daughter, as spelled out in (29).

 

<sup>15</sup>The MARKING feature is introduced in Pollard & Sag (1994: 46) to model the combination of a complementizer and a clause.

#### Frank Van Eynde

```
(29) head-functor-phrase
      -

                          ⇒
       DTRS 
               -
               SYNSEM|LOC|CATEGORY|HEAD|SELECT 1

                                                      ,
                                                       -

                                                        SYNSEM 1
```
The other subtype, called *head-independent-phrase*, subsumes combinations in which the non-head daughter does not select its head sister.<sup>16</sup> In that case the SELECT value of the non-head daughter is of type *none*, as spelled out in (30).

that

Figure 9: Marking and selection in nominal projections

bridge

long

An illustration of the functor treatment is given in Figure 9. The combination of the noun with the adjective is an instance of *head-functor-phrase*, in which the adjective selects an unmarked nominal ( 3 ), shares its MARKING value ( 5 ), and, being a nonargument, shares it with the mother as well. The combination of the resulting nominal with the demonstrative is also an instance of *head-functorphrase*, in which the demonstrative selects an unmarked nominal ( 4 ), but – differently from the adjective – its MARKING value is of type *marked*, and this value is shared with the mother ( 2 ). This accounts for the ill-formedness of \**long that bridge* and \**the that bridge*, since adnominal adjectives and articles are not compatible with a marked nominal. Whether an adnominal functor is marked or unmarked is subject to cross-linguistic variation. The Italian possessives, for instance, are unmarked and can, hence, be preceded by an article, as in *il mio cane* 'the my dog', but their French equivalents are marked: *(\*le) mon chien* '(\*the) my dog'.

<sup>16</sup>This type is introduced in Van Eynde (1998: 130). It will be used in Section 3 to deal with idiosyncratic nominals, such as the Big Mess Construction and the Binominal Noun Phrase Construction.

#### 8 Nominal structures

In this treatment, determiners are marked selectors of an unmarked nominal. Since this definition does not make reference to a specific part-of-speech, it is well-equipped to deal with the categorial heterogeneity of the determiners. The English demonstrative *that*, for instance, can be treated as a pronoun, not only when it is used in nominal position, as in *I like that*, but also when it is used adnominally, as in *I like that bike*. What captures the difference between these uses is not the part-of-speech but the SELECT value: while the adnominal *that* selects an unmarked nominal, its nominal counterpart does not select anything.

### **2.3.3 Nominals with a phrasal functor**

To illustrate how the treatment outlined in the previous section deals with phrasal functors, I take the nominal *a hundred pages*. Since the indefinite article is not compatible with a plural noun like *pages* I assume that this phrase has a left branching structure in which the indefinite article selects the unmarked singular noun *hundred* – its plural counterpart is *hundreds* – and in which the resulting NP selects the unmarked plural noun *pages*, as spelled out in Figure 10.

Figure 10: A phrasal functor

The HEAD value of the entire NP is identified with that of *pages* ( 1 ), which accounts among others for the fact that it is plural: *a hundred pages are/\*is missing*. Its MARKING value is identified with that of *a hundred* ( 2 ). This selects an unmarked plural nominal ( 3 ) and since it is itself a head-functor phrase, its HEAD value, which includes the SELECT value, is shared with that of the numeral *hundred* ( 4 ) and its MARKING value with that of the article ( 2 ). Moreover, the article selects an unmarked singular nominal ( 5 ).

This treatment provides an account for the difference between the well-formed *those two hundred pages* and the ill-formed \**those a hundred pages*. The former

#### Frank Van Eynde

is licensed since numerals like *two* and *hundred* are unmarked, while the latter is not, since the article is marked and since it shares that value with *a hundred pages*.

### **2.3.4 The hierarchy of MARKING values**

The distinction between marked and unmarked nominals in the functor treatment largely coincides with the distinction between nominals with an empty and a non-empty SPR value in the specifier treatment. However, while the latter simply captures the difference between nominals with and without a determiner, the former can be used to capture finer-grained distinctions. To illustrate the need for such distinctions, let us take another look at the Dutch attributive adjectives. As already pointed out in Section 2.3.2, they take the affix *-e* in combination with plural and singular non-neuter nominals, but not in combination with singular neuter nominals, as in *zwart huis* 'black house'. A complication, though, is that they also take the affix in singular neuter nominals if they are introduced by a definite determiner, as in *het zwarte huis* 'the black house'. This has consequences for the status of nominals with a singular neuter head: *zwart huis* and *zwarte huis*, for instance, are both unmarked, but put different constraints on the combination with a determiner. To model this, Van Eynde (2006: 167) differentiates between two types of *unmarked* nominals, as shown in Figure 11.

Figure 11: Hierarchy of MARKING values

Employing the more specific subtypes, the adjectives without affix which select a singular neuter nominal have the MARKING value *bare*, while the adjectives with the affix which select a singular neuter nominal have the value *incomplete*. Since this MARKING value is shared with the mother, the MARKING value of *zwart huis* is *bare*, while that of *zwarte huis* is *incomplete*. This interacts with the SE-LECT value of the determiner. Non-definite determiners select a bare nominal, licensing *een zwart huis* 'a black house', but not \**een zwarte huis*. Definite determiners, by contrast, select an unmarked nominal, which implies that they are

8 Nominal structures

compatible with both bare and incomplete nominals, licensing both *het zwarte huis* and *het zwart huis*. 17

In a similar way, one can make finer-grained distinctions in the hierarchy of *marked* values to capture co-occurrence restrictions between determiners and nominals, as in the functor treatment of the Italian determiner system of Allegranza (2007). See also the treatment of nominals with idiosyncratic properties in Section 3.

# **2.4 Conclusion**

This section has presented the three main treatments of nominal structures in HPSG. They are all surface-oriented and monostratal, and they are very similar in their treatment of the semantics of the nominals. The differences mainly concern the treatment of the determiners and the adjuncts. In terms of the dichotomy between NP and DP approaches, the specifier and the functor treatment side with the former, while the DP treatment sides with the latter. Overall, the NP treatments turn out to be more amenable to integration in a monostratal surfaceoriented framework than the DP treatment; see also Van Eynde (2020), Müller (2021a), and Machicao y Priemer & Müller (2021). Of the two NP treatments, the specifier treatment is closer to early versions of X-bar theory and GPSG. The functor treatment is closer to versions of Categorial (Unification) Grammar, and has also been adopted in Sign-Based Construction Grammar (Sag 2012: 155–157).

# **3 Idiosyncratic nominals**

This section focusses on the analysis of nominals with idiosyncratic properties. Since their analysis often requires a relaxation of the strictly lexicalist approach of early HPSG, I first introduce some basic notions of Constructional HPSG (Section 3.1). Then I present analyses of nominals with a verbal core (Section 3.2), of the Big Mess Construction (Section 3.3) and of idiosyncratic P+NOM combinations (Section 3.4). Finally, we provide pointers to analyses of other nominals with idiosyncratic properties (Section 3.5).

# **3.1 Constructional HPSG**

The lexicalist approach of early HPSG can be characterized as one in which the properties of phrases are mainly determined by properties of the constituent

<sup>17</sup>Normative grammars recommend the use of the form with the affix, but also point out that the form without affix is widely used, especially when the adjective forms a tight semantic unit with the noun.

#### Frank Van Eynde

words and only to a small extent by properties of the combinatory operations. Pollard & Sag (1994: 391), for instance, employ no more than seven types of combinations, including those which were exemplified in Section 2.1.1, i.e. headcomplements, head-adjunct and head-specifier.<sup>18</sup> Over time, though, this radical lexicalism gave way to an approach in which the properties of the combinatory operations play a larger role. The small inventory of highly abstract phrase types was replaced by a finer-grained hierarchy in which the types can be associated with more specific and – if need be – idiosyncratic constraints. This development started in Sag (1997), was elaborated in Ginzburg & Sag (2000), and gained momentum afterward. Characteristic of Constructional HPSG is the use of a bidimensional hierarchy of phrasal signs. In such a hierarchy, the phrases are not only partitioned in terms of HEADEDNESS, but also in terms of a second dimension, called CLAUSALITY, as shown in Figure 12.

Figure 12: Bidimensional hierarchy of clauses

The types in the CLAUSALITY dimension are associated with constraints, in much the same way as the types in the HEADEDNESS dimension. Clauses, for instance, are required to denote an object of type *message* (Ginzburg & Sag 2000: 41).

(31) *clause* ⇒ - SYNSEM|LOC|CONTENT *message*

At a finer-grained level, clauses are partitioned into declarative, interrogative, imperative, exclamative and relative clauses, each with their own constraints. Interrogative clauses, for instance, have a CONTENT value of type *question*, which

<sup>18</sup>The remaining four are head-subject, head-subject-complements, head-marker and head-filler.

#### 8 Nominal structures

is a subtype of *message*, and indicative declarative clauses have a CONTENT value of type *proposition*, which is another subtype of *message*.

Exploiting the possibilities of multiple inheritance, one can define types which inherit properties from more than one supertype. The type *declarative-headsubject-clause*, for instance, inherits the properties of *head-subject-phrase*, on the one hand, and *declarative-clause*, on the other hand. Additionally, it may have properties of its own, such as the fact that its head daughter is a finite verb (Ginzburg & Sag 2000: 43). This combination of multiple inheritance and specific constraints on maximal phrase types is also useful for the analysis of nominals with idiosyncratic properties, as will be shown in Sections 3.2 and 3.3.

# **3.2 Nominals with a verbal core**

Ordinary nominals have a nominal core, but there are also nominals with a verbal core, such as gerunds and nominalized infinitives. They are of special interest, since they figure prominently in the argumentation that triggered the shift from the NP approach to the DP approach in Transformational Grammar. Some examples of gerunds are given in (32), quoted from Quirk et al. (1985: 1290).

	- b. I dislike [Brown painting his daughter].
	- c. Brown is well known for [painting his daughter].

The bracketed phrases have the external distribution of an NP, taking the subject position in (32a), the complement position of a transitive verb in (32b) and the complement position of a preposition in (32c). The internal structure of these phrases, though, shows a mixture of nominal and verbal characteristics. Typically verbal are the presence of an NP complement in (32a)–(32c), of an adverbial modifier in (32a) and of an accusative subject in (32b). Typically nominal is the presence of the possessive in (32a).

To model this mixture of nominal and verbal properties Malouf (2000: 65) develops an analysis along the lines of the specifier treatment, in which the hierarchy of part-of-speech values is given more internal structure, as in Figure 13.

Instead of treating *noun*, *verb*, *adjective*, etc. as immediate subtypes of *part-ofspeech*, they are grouped in terms of intermediate types, such as*relational*, which subsumes (among others) verbs and adjectives, they are partitioned in terms of subtypes, such as *proper-noun* and *common-noun*, and they are extended with types that inherit properties of more than one supertype, such as *gerund*, which is a subtype of both *noun* and *relational*. In addition to the inherited properties,

Frank Van Eynde

Figure 13: The gerund as a mixed category

the gerund has some properties of its own. These are spelled out in a lexical rule which derives gerunds from the homophonous present participles (Malouf 2000: 66).

(33) Lexical rule for gerunds (Malouf 2000: 66):


This rule says that gerunds take the same complements as the present participles from which they are derived ( 2 ). Their compatibility with adverbial modifiers follows from the fact that adverbs typically modify objects of type *relational*, which is a supertype of *gerund*. The availability of different options for realizing the subject is captured by the inclusion of the subject requirement of the present participle in both the SUBJ list and the SPR list of the gerund ( 1 ). To model the two options, Malouf (2000: 15) employs the bidimensional hierarchy of phrase types in Figure 14.

The combination with an accusative subject is subsumed by *nonfin-head-subjcx*, which is a subtype of *head-subject-phrase* and *clause*. Its defining properties are spelled out in (34) (Malouf 2000: 16).<sup>19</sup>

(34) *nonfin-head-subj-cx* ⇒ SYNSEM|LOC|CATEGORY|HEAD|ROOT *–* NON-HD-DTR|SYNSEM|LOC|CATEGORY|HEAD *noun* CASE *acc* 

This construction type subsumes combinations of a non-finite head with an accusative subject, as in (32b). When the non-finite head is a gerund, the HEAD value of the resulting clause is *gerund* and since that is a subtype of *noun*, the

<sup>19</sup>Malouf uses the feature NON-HD-DTR to single out the non-head daughter.

#### 8 Nominal structures

Figure 14: Bidimensional hierarchy of gerundial phrases

clause is also a nominal phrase. This accounts for the fact that its external distribution is that of an NP. By contrast, the combination with a possessive subject is subsumed by *noun-poss-cx*, which is a subtype of *head-specifier-phrase* and *non-clause* (Malouf 2000: 16).<sup>20</sup>

$$\text{(35)}\quad noun\text{-}pass\text{-}\text{cx} \Rightarrow \begin{bmatrix} \begin{bmatrix} \text{SYNEM}[\text{LOC}[\text{HEAD}\ \text{noum}] \\ \text{CONTEN}\ \text{scope}\ \text{-}object \end{bmatrix} \end{bmatrix}$$

This construction subsumes combinations of a nominal and a possessive specifier, as in *Brown's house*, and since *noun* is a supertype of *gerund*, it also subsumes combinations with the gerund, as in (32a).

In sum, Malouf's analysis of the gerund involves a reorganization of the partof-speech hierarchy, a lexical rule and the addition of two construction types.

# **3.3 The Big Mess Construction**

In ordinary nominals, determiners precede attributive adjectives. Changing the order yields ill-formed combinations, such as \**long that bridge* and \**very tall every man*. However, this otherwise illegitimate order is precisely what is found in the Big Mess Construction (BMC), a term coined by Berman (1974).

	- b. [How serious a problem] is this?

<sup>20</sup>Malouf treats the English possessive as a genitive, unlike Sag et al. (2003: 199); see footnote 8.

#### Frank Van Eynde

The idiosyncratic order in (36) is required if the nominal is introduced by the indefinite article, and if the preceding AP is introduced by one of a small set of degree markers, including *so, as, how, this, that* and *too*.

# **3.3.1 A specifier treatment**

A specifier treatment of the BMC is provided in Ginzburg & Sag (2000: 201). It adopts a left branching structure, as in [[[*so good*] *a*] *bargain*], in which *so good* is the specifier of the indefinite article and in which *so good a* is the specifier of *bargain*. This is comparable to the treatment of the possessive in [[[*the Queen of England*] *'s*] *sister*] in Section 2.1.3. However, while there is evidence that *the Queen of England's* is a constituent, since it may occur independently, as in (37), there is no evidence that *so good a* is a constituent, as shown in (38).


Instead, there is evidence that the article forms a constituent with the following noun, since it also precedes the noun when the AP is in postnominal position, as in (39).

(39) We never had [a bargain] [so good as this one].

It is, hence, preferable to assign a structure in which the AP and the NP are sisters, as in [[*so good*] [*a bargain*]].

### **3.3.2 A functor treatment**

A structure in which the AP and the NP are sisters is adopted in Van Eynde (2007), Kim & Sells (2011), Kay & Sag (2012), Arnold & Sadler (2014) and Van Eynde (2018), all of which are functor treatments. They also share the assumption that the combination is an NP and that its head daughter is the lower NP. The structure of the head daughter is spelled out in Figure 15.

The article has a MARKING value of type *a*, which is a subtype of *marked* and which it shares with the mother.<sup>21</sup>

The AP is also treated as an instance of the head-functor type in Van Eynde (2007), Kim & Sells (2011) and Van Eynde (2018). The adverb has a MARKING value of type *marked*, so that the AP is marked as well, as shown in Figure 16.

<sup>21</sup>The MARKING value of the article looks similar to its PHONOLOGY value, but it is not the same. The PHONOLOGY values of *a* and *an*, for instance, are different, but their MARKING value is not.

#### 8 Nominal structures

Figure 16: The AP *so good*

In combination with the fact that the article selects an unmarked nominal, this accounts for the ill-formedness of (40).

(40) a. \* It's a so good bargain I can't resist buying it.

b. \* A how serious problem is it?

By contrast, adverbs like *very* and *extremely* are unmarked, so that the APs which they introduce are admissible in this position, as in (41).

	- b. We struck an extremely good bargain.

To model the combination of the AP with the lower NP, it may at first seem plausible to treat the AP as a functor which selects an NP that is introduced by the indefinite article. This, however, has unwanted consequences: given that SELECT is a HEAD feature, its value is shared between the AP and the adjective, so that the latter has the same SELECT value as the AP, erroneously licensing such combinations as \**good a bargain*. To avoid this, Van Eynde (2018) models the combination in terms of a special type of phrase, called *big-mess-phrase*, whose place in the hierarchy of phrase types is defined in Figure 17.

Frank Van Eynde

Figure 17: Bidimensional hierarchy of nominals

The types in the HEADEDNESS dimension are a subset of those in Figure 8. The types in the CLAUSALITY dimension mainly capture semantic and categoryspecific properties, in analogy with the hierarchy of clausal phrases in Ginzburg & Sag (2000: 363). One of the non-clausal phrase types is *nominal-parameter*:

> 

```
(42) nominal-parameter ⇒

       SYNSEM|LOC

                    CATEGORY|HEAD noun
                    CONTENT

                               parameter
                               INDEX 1
                               RESTR 2 ∪ 3

       DTRS 
                   -
                   SYNSEM|LOC|CONTENT|RESTR 2

                                                 , 4

       HEAD-DTR 4

                    SYNSEM|LOC|CONTENT

                                           parameter
                                           INDEX 1
                                           RESTR 3
```
The mother shares its index with the head daughter ( 1 ) and its RESTR(ICTION) value is the union of the RESTR values of the daughters ( 2 ) and ( 3 ). In the hierarchy of non-clausal phrases, this type contrasts among others with quantified nominals, which have a CONTENT value of type *quant-rel* (Ginzburg & Sag 2000: 203–205). A subtype of *nominal-parameter* is*intersective-modification*, as defined in (43).

(43) *intersective-modification* " ⇒ SYNSEM|LOC|CONTENT|INDEX 1 DTRS -SYNSEM|LOC|CONTENT|INDEX 1 , X #

#### 8 Nominal structures

This constraint requires the mother to share its index also with the non-head daughter. It captures the intuition that the noun and its non-head sister apply to the same entities, as in the case of *red box*. 22

Maximal types inherit properties of one of the types of headed phrases and of one of the non-clausal phrase types. Regular nominal phrases, for instance, such as *red box*, are subsumed by a type, called *regular-nominal-phrase*, that inherits the constraints of *head-functor-phrase*, on the one hand, and *intersectivemodification*, on the other hand. Another maximal type is *big-mess-phrase*. Its immediate supertype in the CLAUSALITY hierarchy is the same as for the regular nominal phrases, i.e. *intersective-modification*, but the one in the HEADEDNESS hierarchy is different: being a subtype of *head-independent-phrase*, its non-head daughter does not select the head daughter. Its SELECT value is, hence, of type *none*. In addition to the inherited properties, the BMC has some properties of its own. They are spelled out in (44).

(44) *big-mess-phrase* ⇒

The head daughter is required to be a regular nominal phrase whose MARKING value is of type *a*, and the other daughter is required to be an adjectival headfunctor phrase with a MARKING value of type *marked*. This licenses APs which are introduced by a marked adverb, as in *so good a bargain* and *how serious a problem*, while it excludes unmarked APs, as in \**good a bargain* and \**very big a house*. Iterative application is not licensed, since (44) requires the head daughter to be of type *regular-nominal-phrase*, which is incompatible with the type *bigmess-phrase*. This accounts for the fact that a big mess phrase cannot contain another big mess phrase, as in \**that splendid so good a bargain*.

A reviewer remarked that this analysis allows combinations like *so big an expensive red house*, suggesting that it should not. It is not certain, though, that this combination is ill-formed. Notice, for instance, that the sentences in (45), quoted from Zwicky (1995: 116) and Troseth (2009: 42) respectively, are well-formed.

	- b. That's as beautiful a little black dress as I've ever seen.

<sup>22</sup>Another subtype of *nominal-parameter* is *inverted-predication*, which subsumes the Binominal Noun Phrase Construction and certain types of apposition; see Section 3.5.

#### Frank Van Eynde

In sum, the analysis of the Big Mess Phrase involves the addition of a type to the bidimensional hierarchy of phrase types, whose properties are partly inherited from its supertypes and partly idiosyncratic.

# **3.4 Idiosyncratic P+NOM combinations**

When an ordinary nominal combines with a preposition, the result is a PP. The French *de* 'of', for instance, heads a PP in *je viens de Roubaix* 'I come from Roubaix'. In *beaucoup de farine* 'much flour', by contrast, *de* has a rather different role, as argued in Abeillé et al. (2004). Similar contrasts are found in other languages. The English *of*, for instance, heads a PP in *the dog of the neighbors*, but its role in *these sort of problems* is rather different, as argued in Maekawa (2015).

# **3.4.1 A specifier treatment**

In their specifier treatment of *beaucoup de farine* 'much flour', Abeillé et al. (2004) treat *de* as a weak head. Typical of a weak head is that it shares nearly all properties of its complement, as spelled out in (46).

*de* has the same values for HEAD, SUBJ, SPR and CONTENT as its nominal complement. The only difference concerns the MARKING value: *de* requires an unmarked complement, but its own MARKING value is of type *de*. Since it shares this MARK-ING value with the mother, the latter is compatible with specifiers that require a nominal that is introduced by *de*, such as *beaucoup* 'much'/'many', whose lexical entry is given in (47).<sup>23</sup>

<sup>23</sup>In this entry, quoted from Abeillé et al. (2004: 18), the value of SPEC is of type *synsem*, as in Pollard & Sag (1994: 45), and not of type *semantic-object*, as in Ginzburg & Sag (2000: 362).

8 Nominal structures

The selected nominal is required to be unsaturated for SPR and to have a MARKING value of type *de*. *Beaucoup* is treated as an adverb that shares the index and the restrictions of its nominal head sister. Conversely, the nominal also selects its specifier via its SPR value, following the mutual selection regime of the specifier treatment; see Section 2.1.2.

### **3.4.2 A functor treatment**

In a functor treatment of *beaucoup de farine* 'much flour', *de* and *beaucoup* are both functors. The preposition selects a nominal of type *bare* and has a MARKING value of *de* which it shares with the mother. The quantifier *beaucoup* selects a nominal with the MARKING value *de* and has a MARKING value of type *marked* which it shares with the NP as a whole, as spelled out in Figure 18.

Figure 18: An adverbial functor and a prepositional functor

Since the noun is the head daughter of *de farine*, the part-of-speech, valence and meaning of *de farine* are shared directly with *farine*, rather than via the entry for *de*, as in the weak head treatment.

#### Frank Van Eynde

Comparing the functor treatment with the weak head treatment, a major difference concerns the status of *de*. In the former it is uniformly treated as a semantically vacuous preposition; in the latter it shares the part-of-speech and CONTENT value of its complement, so that it is a noun with a CONTENT value of type *scope-object* in *beaucoup de farine* and a verb with a CONTENT value of type *state-of-affairs* in (48), where it takes an infinitival VP as its complement.

(48) De to sortir go.out un a peu bit te you ferait would.do du of.the bien. good (French) 'Going out a bit would do you some good.'

In some cases, this sharing leads to analyses that are empirically implausible. An example is discussed in Maekawa (2015), who provides an analysis of English nominals of the *kind*/*type*/*sort* variety. A typical property of these nominals is that the determiner may show agreement with the rightmost noun, as in *these sort of problems* and *those kind of pitch changes*, rather than with the noun that it immediately precedes. To model this Maekawa considers the option of treating *of* and the immediately preceding noun as weak heads, but dismisses it, since it has the unwanted effect of treating *kind*/*type*/*sort* as plural. As an alternative, he develops an analysis in which *of* and the preceding noun are functors (Maekawa 2015: 149). This yields a plural nominal, but without the side-effect of treating *kind*/*type*/*sort* as plural.

# **3.5 Other nominals with idiosyncratic properties**

There are many more types of nominals with idiosyncratic properties that I cannot fully survey here. Instead, I mention some that have been analyzed in HPSG terms and add pointers to the relevant literature.

A much-studied nominal with idiosyncratic properties is the Binominal Noun Phrase Construction (BNPC), exemplified in (49).

	- b. She had [a skullcracker of a headache].

In contrast to ordinary [NP–*of* –NP] sequences, as in *the dog of my neighbor*, where the first nominal is the head of the entire NP, and where the second nominal is part of its PP adjunct, the relation between the nominals is a predicative one in the BNPC: her husband is claimed to be a nitwit, and the headache is claimed to be like a skullcracker. HPSG treatments of the BNPC are provided in Kim & Sells (2014) and Van Eynde (2018). The latter extends the phrase type hierarchy in Figure 17, defining the BNPC as a maximal type that inherits from

#### 8 Nominal structures

*head-independent-phrase* and *inverted-predication*. To capture the intuition that the second nominal is the head of the entire NP, the preposition *of* is treated as a functor that selects a nominal head, as in Maekawa's treatment of the preposition in *these sort of problems*; see Section 3.4.

Another special kind of nominals is apposition. It comes in (at least) two types, known as close apposition and loose apposition. Relevant examples are given in (50).

	- b. [Sarajevo, the capital of Bosnia,] is where WWI began.

Both types are compared and analyzed in Kim (2012) and Kim (2014). Van Eynde & Kim (2016) provides an analysis of loose apposition in the Sign-Based Construction Grammar framework.

Comparable to the nominals with a verbal core, such as gerunds and nominalized infinitives, are nominals with an adjectival core, as in *the very poor* and *the merely skeptical*. They are described and given an HPSG analysis in Arnold & Spencer (2015).

Idiosyncratic are also the nominals with an extracted*wh*-word, as in the French (51) and the Dutch (52).


'What kind of strange noises are those?'

The French example is analyzed in Abeillé et al. (2004: 20–21) and the Dutch one in Van Eynde (2004: 47–50). Other kinds of discontinuous NPs are treated in De Kuthy (2002).

While all of the above are idiosyncratic in at least one respect, opinions diverge about predicative nominals. They are claimed to be special by Ginzburg & Sag (2000: 409), who employ a lexical rule mapping nominal lexemes onto predicative nouns and extending their valence with an unexpressed subject that is identified with an argument of the predicate selecting verb. In (53), for instance, *Leslie* is treated as the subject of *sister*, and the copula as a semantically vacuous subject raising verb.<sup>24</sup>

<sup>24</sup>Müller (2009: 225) also extends the valence with an unexpressed subject, but models this in terms of a non-branching phrasal projection, rather than by a lexical rule.

#### Frank Van Eynde

### (53) Leslie is my sister.

Alternatively, Van Eynde (2015: 158–163) treats the copula as a relation that assigns the THEME role to its subject and the ATTRIBUTE role to its predicative complement. In that analysis, predicative nominals are treated along the same lines as ordinary nominals.

# **4 Conclusion**

This chapter has provided a survey of how nominals are analyzed in HPSG. Over time three treatments have taken shape: the specifier treatment, the DP treatment and the functor treatment. Each was presented and applied to ordinary nominals in Section 2. A comparison showed that treatments that adopt the NP approach fit in better with the surface-oriented monostratal character of HPSG than the DP treatment does. I then turned to nominals with idiosyncratic properties in Section 3. Since their analysis often requires a relaxation of the strictly lexicalist stance of early HPSG, I first introduced some basic notions of Constructional HPSG and then applied these notions to such idiosyncratic nominals as the gerund, the Big Mess Construction and irregular P+NOM combinations. Some of these analyses adopt the specifier treatment, others the functor treatment. When both are available, as in the case of the Big Mess Construction and irregular P+NOM combinations, the functor treatment seems more plausible. Finally, I have added pointers to relevant literature for other nominals with idiosyncratic properties, such as the Binominal Noun Phrase Construction, apposition, nominals with an adjectival core and discontinuous NPs.

# **Acknowledgments**

For their comments on earlier versions of this chapter I would like to thank Liesbeth Augustinus, two anonymous reviewers and the editors of the handbook. My contributions to the HPSG framework span nearly three decades now, starting with work on a EU-financed project to which Valerio Allegranza, Doug Arnold and Louisa Sadler also contributed (Van Eynde & Schmidt 1998). Beside the EUfunding it was conversations with Danièle Godard and Ivan Sag that convinced me of the merits and potential of HPSG. I became a regular at the annual HPSG conferences by 1998 and started teaching it in Leuven around the same time. A heart-felt thanks goes to the colleagues who over the years have created such a stimulating environment to work in.

8 Nominal structures

# **References**


#### Frank Van Eynde


8 Nominal structures

*munications Technology, Keihanna*, 116–130. Stanford, CA: CSLI Publications. http://csli-publications.stanford.edu/HPSG/2008/ (10 February, 2021).


#### Frank Van Eynde

(eds.), *Workshop on Text Representation and Domain Modelling – Ideas From Linguistics and AI* (KIT-Report 97). Technical University Berlin, 1992.


8 Nominal structures

*Linguistics in the Netherlands 1997: Selected papers from the Eighth CLIN Meeting* (Language and Computers: Studies in Practical Linguistics 25), 119–133. Amsterdam: Rodopi.


# **Chapter 9**

# **Argument structure and linking**

Anthony R. Davis Southern Oregon University

Jean-Pierre Koenig University at Buffalo

# Stephen Wechsler

The University of Texas

In this chapter, we discuss the nature and purpose of argument structure in HPSG, focusing on the problems that theories of argument structure are intended to solve, including: (1) the relationship between semantic arguments of predicates and their syntactic realizations, (2) the fact that lexical items can occur in more than one syntactic frame (so-called valence or diathesis alternations), and (3) argument structure as the locus of binding principles. We also discuss cases where the argument structure of a verb includes more elements than predicted from the meaning of the verb, as well as rationales for a lexical approach to argument structure.

# **1 Introduction**

For a verb or other predicator to compose with the phrases or pronominal affixes expressing its semantic arguments, the grammar must specify the mapping between the semantic participant roles and syntactic dependents of that verb. For example, the grammar of English indicates that the subject of *eat* fills the eater role and the object of *eat* fills the role of the thing eaten. In HPSG, this mapping is usually broken down into two simpler mappings by positing an intermediate representation called ARG-ST (argument structure). The first mapping

Anthony R. Davis, Jean-Pierre Koenig & Stephen Wechsler. 2021. Argument structure and linking. In Stefan Müller, Anne Abeillé, Robert D. Borsley & Jean- Pierre Koenig (eds.), *Head-Driven Phrase Structure Grammar: The handbook*, 315–367. Berlin: Language Science Press. DOI: 10.5281/zenodo.5599834

#### Anthony R. Davis, Jean-Pierre Koenig & Stephen Wechsler

connects the participant roles within the semantic CONTENT with the elements of the value of the ARG-ST feature; here we will call the theory of this mapping *linking theory* (see Section 4). The second mapping connects those ARG-ST list elements to the elements of the valence lists, namely COMPS (complements) and SUBJ (subject) and SPR (specifier); we will refer to this second mapping as *argument realization* (see Section 2).<sup>1</sup> These two mappings are illustrated with the simplified lexical sign for the verb *eat* in (1) (for ease of presentation, we use a standard predicate-calculus representation of the value of CONTENT in (1) rather than the attribute-value representation we introduce later on).

(1) Lexical sign for the verb *eat*:


In (1), "NP" abbreviates a feature description representing syntactic and semantic information about a nominal phrase. The variables and are the referential indices for the eater and eaten arguments, respectively, of the *eat* relation. The semantic information in NP semantically restricts the value or referent of .

The ARG-ST feature plays an important role in HPSG grammatical theory. In addition to regulating the mapping from semantic arguments to grammatical relations, ARG-ST is the locus of the theories of anaphoric binding and other construal relations such as control and raising. (This chapter focuses on the function of ARG-ST in semantic mapping, with some discussion of binding and other construal relations only insofar as they interact with that mapping. A more detailed look at binding is presented in Müller (2021a), Chapter 20 of this volume. Control and raising is the topic of Chapter 12 (Abeillé 2021).)

In HPSG, verb diathesis alternations, voice alternations, and derivational processes such as category conversions are all captured within the lexicon (see Section 5 and Davis & Koenig 2021, Chapter 4 of this volume). The different variants of a word are grammatically related either through lexical rules or by means of the lexical type hierarchy. HPSG grammars explicitly capture paradigmatic relations between word variants, making HPSG a *lexical approach to argument structure*, in the sense of Müller & Wechsler (2014). This fundamental property of lexicalist theories contrasts with many transformational approaches, where

<sup>1</sup>Some linguists, such as Levin & Rappaport Hovav (2005), use the term "argument realization" more broadly, to encompass linking as well.

#### 9 Argument structure and linking

such relationships are treated as syntagmatically related through operations on phrasal structures representing sentences and other syntactic constituents. Arguments for the lexical approach are reviewed in Section 8.

Within the HPSG framework presented here, we will formulate and address a number of empirical and theoretical questions:


These questions will be addressed below in the course of presenting the theory. We begin by considering ARG-ST itself (Section 2), followed by the mapping from ARG-ST to valence lists (Section 3), and the mapping from CONTENT to ARG-ST (Section 4). The remaining sections address further issues relating to argument structure: the nature of argument alternations, extending the ARG-ST attribute to include additional elements, whether ARG-ST is a universal feature of languages, and a comparison of the lexicalist view of argument structure presented here with phrasal approaches.

# **2 The representation of argument structure in HPSG**

In the earliest versions of HPSG, the selection of dependent phrases was specified in the SUBCAT feature of the head word (Pollard & Sag 1987, Pollard & Sag 1994: Chapters 1–8). The value of SUBCAT is a list of items, each of which corresponds to the SYNSEM value of a complement or subject. The following are SUBCAT features for an intransitive verb, a transitive verb, and a transitive verb with obligatory PP complement:

	- b. *eat*: [SUBCAT h NP, NP i]
	- c. *put*: [SUBCAT h NP, NP, PP i]

#### Anthony R. Davis, Jean-Pierre Koenig & Stephen Wechsler

Phrase structure rules in the form of immediate dominance schemata identify a certain daughter node as the head daughter (HEAD-DTR) and others, including subjects, as complement daughters (COMP-DTRS). In keeping with the *Subcategorization Principle*, here paraphrased from Pollard & Sag (1994: 34), list items are effectively "canceled" from the SUBCAT list as complement phrases, including the subject, are joined with the selecting head:

(3) Subcategorization Principle: In a headed phrase, the SUBCAT value of the HEAD-DTR (head daughter) is the concatenation of the phrase's SUBCAT list with the list of SYNSEM values of the COMPS-DTRS (complement daughters).

Phrasal positions are distinguished by their saturation level: "VP" is defined as a verbal projection whose SUBCAT list contains a single item, corresponding to the subject, and "S" is defined as a verbal projection whose SUBCAT list is empty.

The "subject" of a verb, a distinguished dependent with respect to construal processes such as binding, control, and raising, was then defined as the first item in the SUBCAT list, hence the last item with which the verb combines. However, defining "subject" as the last item to combine with the head proved inadequate (Pollard & Sag 1994: Chapter 9). There are many cases where the dependent displaying subject properties need not be the last item added to the head projection. For example, in German the subject is a nominal in nominative case (Reis 1982), but the language allows subjectless clauses containing only a dative or genitive non-subject NP. If that oblique NP is the only NP dependent to combine with the verb, then it is *ipso facto* the last NP to combine, yet such obliques lack the construal properties of subjects in German.

Consequently, the SUBCAT list was split into two valence lists, a SUBJ list of length zero or one for subjects, and a COMPS list for complements. Nonetheless, certain grammatical phenomena, such as binding and other construal processes, must still be defined on a single list comprising both subject and complements (Manning et al. 1999). Additionally, some syntactic arguments are unexpressed or realized by affixal pronouns, rather than as subject or complement phrases. The new list containing all the syntactic arguments of a predicator was named ARG-ST (argument structure).

In clauses without implicit or affixal arguments or extracted arguments, the ARG-ST is the concatenation of SUBJ and COMPS respectively. For example, the SUBCAT list for *put* in (2c) is replaced with the following:

9 Argument structure and linking

$$\begin{array}{c} \text{(4)} \quad \begin{bmatrix} \text{PHON} & \langle put \rangle \\ \text{SUBJ} & \langle \square \rangle \\ \text{COMPS} & \langle \square \rangle \langle \square \rangle \\ \text{COMPS} & \langle \square \rfloor \overline{\square} \rangle \\ \text{ARG-ST} & \langle \square \text{NP}, \boxed{\square} \text{NP}, \boxed{\square} \text{PP} \end{bmatrix} \end{array}$$

The idealization according to which ARG-ST is the concatenation of SUBJ and COMPS is canonized as the *Argument Realization Principle* (ARP) (Sag et al. 2003: 494). Systematic exceptions to the ARP, that is, dissociations between VALENCE and ARG-ST, are discussed in Section 3.2 below.

A predicator's valence lists indicate its requirements for syntactic concatenation with dependents (Section 3). ARG-ST, meanwhile, provides syntactic information about the expression of semantic roles and is related, via linking theory, to the lexical semantics of the word (Section 3.2). The ARG-ST list contains specifications for the union of the verb's local phrasal dependents (the subject and complements, whether they are semantic arguments, raised phrases, or expletives) and its arguments that are not realized locally, whether they are unbounded dependents, affixes, or unexpressed arguments.

Figure 1 provides a schematic representation of linking and argument realization in HPSG, illustrated with the verb *donate*, as in *Mary donated her books to the library*. Linking principles govern the mapping of participant roles in a predicator's CONTENT to elements of the ARG-ST list. Argument realization is shown in this figure only for mapping to valence, which represents locally realized phrasal dependents; affixal and null arguments are not depicted (but are discussed below). The ARG-ST and valence lists in this figure contain only arguments linked to participant roles, but in Section 6 we discuss proposals for extending ARG-ST to include additional elements. In Section 3, we examine cases where the relationship between ARG-ST and valence violates the ARP.

# **3 Argument realization: The mapping between ARG-ST and valence lists**

# **3.1 Variation in the expression of arguments**

The valence features SUBJ and COMPS are responsible for composing a verb with its dependents, but this is just one of the ways that semantic arguments of a verb are expressed in natural language. Semantic arguments can be expressed Anthony R. Davis, Jean-Pierre Koenig & Stephen Wechsler

$$\begin{array}{l|l} \text{Semantics} & \begin{bmatrix} \text{donate-rel} \\ \text{NOON} \\ \text{(conv7new)} \end{bmatrix} & \begin{bmatrix} \text{donate-rel} \\ \text{HOET} \\ \text{THEIME} \end{bmatrix} & \begin{bmatrix} \text{Semantics} \\ \text{verb} \end{bmatrix} \\ \text{verb} & \begin{cases} \text{Linking Price} \end{cases} \end{array}$$

$$\begin{cases} \text{Argument} & \begin{cases} \text{Ako-sr} \cdot \text{[\\$]NPi, \\$\\$NPk\_{\%}, \\$\\$PP\{to]} \cdot \text{]} \end{cases} \end{array} \begin{cases} \text{Sarntics} \cdot \text{R} \text{(\\$\\$M\_{\%}\\$, \\$\\$PP\{to])} \cdot \text{(\\$\\$M\_{\%}\\$, \\$\\$PP\{to])} \cdot \text{(\\$\\$M\_{\%}\\$, \\$\\$PP\{to])} \cdot \text{(\\$\\$M\_{\%}\\$, \\$\\$PP\{to])} \cdot \text{(\\$\\$M\_{\%}\\$, \\$\\$PP\{to])} \cdot \text{(\\$\\$M\_{\%}\\$, \\$\\$PP\{to])} \cdot \text{(\\$\\$M\_{\%}\\$, \\$\\$PP\{to])} \cdot \text{(\\$\\$M\_{\%}\\$, \\$\\$PP\{to])} \cdot \text{(\\$\\$M\_{\%}\\$, \\$\\$PP\{to])} \cdot \text{(\\$\\$M\_{\%}\\$, \\$\\$PP\{to])} \cdot \text{(\\$\\$M\_{\%}\\$, \\$\\$PP\{to])} \cdot \text{(\\$\\$M\_{\%}\\$, \\$\\$PP\{to])} \cdot \text{(\\$\\$\\$M\_{\%}\\$, \\$\\$PP\{to])} \cdot \text{(\\$\\$\\$M\_{\%}\\$, \\$\\$\\$PP\{to])} \cdot \text{(\\$\\$\\$P$$

Figure 1: Linking and argument realization in HPSG, illustrated with the verb *donate*

in various linguistic forms: as local syntactic dependents (SUBJ and COMPS), as affixes, or displaced in unbounded dependency constructions (SLASH).

Affixal arguments can be illustrated with the first person singular Spanish verb *hablo* 'speak.1SG', as in (5).

(Spanish)

(5) a. Habl-o speak-1SG español. Spanish 'I speak Spanish.'

 

b. *hablo* 'speak.1SG':

$$
\begin{bmatrix}
\text{PHON} & \langle \text{a} \text{l} \text{l} \rangle \\
\text{SUBJ} & \langle \rangle \\
\text{comps} & \langle \boxed{\text{\tiny\tiny\}} \\
\text{ARG-ST} & \langle \text{NP:} \begin{bmatrix}
\text{p} \text{pro} \\
\text{INDEX} \begin{bmatrix}
\text{PERS 1st} \\
\text{NUM sg}
\end{bmatrix}
\end{bmatrix},
\begin{bmatrix}
\text{\tiny\tiny \tiny \tiny \tiny \tiny \tiny \tiny \tiny \tiny \tiny \}} \\
\end{bmatrix}
$$

The *-o* suffix contributes the first person singular pronominal subject content to the verb form (the morphological process is not shown here; see Crysmann 2021, Chapter 21 of this volume). The pronominal subject appears on the ARG-ST list

#### 9 Argument structure and linking

and hence is subject to the binding theory. But it does not appear in SUBJ, if no subject NP appears in construction with the verb.

A lexical sign whose ARG-ST list is just the concatenation of its SUBJ and COMPS lists conforms to the Argument Realization Principle (ARP); such signs are called *canonical signs* by Bouma et al. (2001). Non-canonical signs, which violate the ARP, have been approached in two ways. In one approach, a lexical rule takes as input a canonical entry and derives a non-canonical one by removing items from the valence lists, while adding an affix or designating an item as an unbounded dependent by placement on the SLASH list. In the other approach, a feature of each ARG-ST list item specifies whether the item is subject to the ARP (hence mapped to a valence list), or ignored by it (hence expressed in some other way). See Davis & Koenig (2021), Chapter 4 of this volume for more detail on the lexicon and Miller & Sag (1997) for a treatment of French clitics as affixes.

A final case to consider is null anaphora, in which a semantic argument is simply left unexpressed and receives a definite pronoun-like interpretation. Japanese *mi-* 'see' is transitive but the object NP can be omitted as in (6).

(6) Naoki-ga Naoki-NOM mi-ta. see-PST 'Naoki saw it/him/her/\*himself.'

Null anaphors of this kind typically arise in discourse contexts similar to those that license ordinary weak pronouns, and the unexpressed object often has the obviation effects characteristic of overt pronouns, as shown in (6). HPSG eschews the use of silent formatives like "small *pro*" when there is no evidence for such items, such as local interactions with the phrase structure. Instead, null anaphors of this kind are present in ARG-ST but absent from valence lists. ARG-ST is directly linked to the semantic CONTENT and is the locus of Binding Theory, so the presence of a syntactic argument on the ARG-ST list but not a valence list accounts for null anaphora. To account for obviation, the ARG-ST list item, when unexpressed, receives the binding feature of ordinary (non-reflexive) pronouns, usually *ppro*. This language-specific option can be captured in a general way by valence and ARG-ST defaults in the lexical hierarchy for verbs.

# **3.2 The syntax of ARG-ST and its relation to valence lists**

The ordering of members of the ARG-ST list represents a preliminary syntactic structuring of the set of argument roles. In that sense, ARG-ST functions as an interface between the lexical semantics of the verb and the expressions of dependents as described in Section 3. Its role thus bears some relation to the initial

(Japanese)

#### Anthony R. Davis, Jean-Pierre Koenig & Stephen Wechsler

stratum in Relational Grammar (Perlmutter & Postal 1984), *argument structure* (including intrinsic classifications) in LFG Lexical Mapping TheoryLexical Functional Grammar (Bresnan et al. 2016), macroroles in Role and Reference Grammar (Van Valin & LaPolla 1997), D-structure in Government and Binding Theory, and the Merge positions of arguments in Minimalism, assuming in the last two cases the Uniformity of Theta Assignment Hypothesis (UTAH) (Baker 1988: 46) or something similar. However, it also differs from all of those in important ways.

Semantic constraints on ARG-ST are explored in Section 4 below. But ARG-ST represents not only semantic distinctions between the arguments, but also syntactic ones. Specifically, the list ordering represents relative syntactic *obliqueness* of arguments. The least oblique argument is the subject (SUBJ), followed by the complements (COMPS). Following Manning (1996), term arguments (direct arguments, i.e., subjects and objects) are assumed to be less oblique than "oblique" arguments (adpositional and oblique case marked phrases), followed finally by predicate and clausal complements. The transitive ordering relation on the ARG-ST list is called *o-command* (obliqueness command): the list item that corresponds to the subject o-commands those corresponding to complements; a list item corresponding to an object o-commands those corresponding to any obliques; and so on (see Müller 2021a, Chapter 20 of this volume for details).

Relative obliqueness conditions a number of syntactic processes and phenomena, including anaphoric binding. The o-command relation replaces the c-command in the Principles A, B, and C of Chomsky's (1981) configurational theory of binding. For example, HPSG's Principle B states that an ordinary pronoun cannot be o-commanded by its coargument antecedent, which accounts for the pronoun obviation observed in the English sentence *Naoki saw him*∗/ , and also accounts for obviation in the Japanese sentence (6) above.

Relative obliqueness also conditions the accessibility hierarchy of Keenan & Comrie (1977), according to which a language allowing relativization of some type of dependent also allows relativization of any dependent less oblique than it. Hence if a language has relative clauses at all, it has subject relatives; if it allows obliques to relativize, then it also allows subject and object relatives; and so on. Similar implicational universals apply to verb agreement with subjects, objects, and obliques (Greenberg 1966).

Returning now to argument realization, we saw above that the rules for the selection of the subject from among the verb's arguments are also stated in terms of the ARG-ST list. In a canonical realization the subject is the first list item, o-commanding all of its coarguments. In various non-canonical circumstances, such as those we noted above, o-command relations do not correspond to ordering on the valence lists, and this can be reflected in phenomena such as anaphoric

9 Argument structure and linking

binding. In the following section we examine another kind of non-canonical relationship between ARG-ST and valence in more detail: syntactic ergativity, exemplified by Balinese.

# **3.3 Syntactic ergativity**

The autonomy of ARG-ST from the valence lists is further illustrated by crosslinguistic variation in the mapping between them. As just noted, in English and many other languages, the initial item in ARG-ST maps to the subject. However, languages with so-called *syntactically ergative* clauses have been analyzed as following a different mapping rule. Crucially, the ARG-ST ordering in those languages is still supported by independent evidence from properties such as binding and NP versus PP categorial status of arguments. Balinese (Austronesian), as analyzed by Wechsler & Arka (1998), is such a language. In the morphologically unmarked, and most common voice, called *Objective Voice* (OV), the subject is any term *except* the ARG-ST-initial one.

Balinese canonically has SVO order, regardless of the verb's voice form (Artawa 1994; Wechsler & Arka 1998). The preverbal NPs in (7) are the surface subjects and the postverbal ones are complements. When the verb appears in the unmarked objective voice (OV), a non-initial term is the subject, as in (7a). But verbs in the *Agentive Voice* (AV) select as their subject the ARG-ST-initial item, as in (7b).

(7) a. Bawi pig adol OV.sell ida. 3SG 'He/She sold a pig.' b. Ida ng-adol bawi.

(Balinese)

3SG AV-sell pig 'He/She sold a pig.'

A ditransitive verb, such as the benefactive applied form of *beli* 'buy' in (8), has three term arguments on its ARG-ST list. The subject can be either term that is non-initial in ARG-ST:

	- b. I ART Wayan Wayan beli-ang=a OV.buy-APPL=3 potlote pencil.DEF ento. that '(S)he bought Wayan the pencil.'

Anthony R. Davis, Jean-Pierre Koenig & Stephen Wechsler

Wechsler and Arka argue that Balinese voice alternations do not affect ARG-ST list order. Thus the agent argument can bind a coargument reflexive pronoun (but not vice versa), regardless of whether the verb is in OV or AV form:

(9) a. Ida 3SG ny-ingakin AV-see ragan SELF idane. (Balinese) '(S)he saw himself/herself.' b. Ragan idane SELF cingakin OV.see ida. 3SG

'(S)he saw himself/herself.'

The 'seer' argument o-commands the 'seen', with the AV versus OV voice forms regulating subject selection:



Languages like Balinese illustrate the autonomy of ARG-ST. Although the agent binds the patient in both (9a) and (9b), the binding conditions cannot be stated directly on the thematic hierarchy. For example, in HPSG a raised argument appears on the ARG-ST list of the raising verb, even though that verb assigns no thematic role to that list item. But a raised subject can bind a coargument reflexive in Balinese (this is comparable to English *John seems to himself to be ugly*). Anaphoric binding in Balinese raising constructions thus behaves as predicted

#### 9 Argument structure and linking

by the ARG-ST based theory (Wechsler 1999). In conclusion, neither valence lists nor CONTENT provides the right representation for defining binding conditions, but ARG-ST fits the bill.

Syntactically ergative languages besides Balinese that have been analyzed as using an alternative mapping between ARG-ST and valence include Tagalog, Inuit, some Mayan languages, Chukchi, Toba Batak, Tsimshian languages, and Nadëb (Manning 1996; Manning et al. 1999).

Interestingly, while the GB/Minimalist configurational binding theory may be defined on analogues of the valence lists or CONTENT, those theories lack any analogue of ARG-ST. This leads to special problems for such theories in accounting for binding in many Austronesian languages like Balinese. In transformational theories since Chomsky (1981), anaphoric binding conditions are usually stated with respect to the A-positions (argument positions). A-positions are analogous to HPSG valence list items, with relative c-command in the configurational structure corresponding to relative list ordering in HPSG, in the simplest cases. Meanwhile, to account for data similar to (9), where agents asymmetrically bind patients, Austronesian languages like Balinese were said to define binding on the "thematic structure" encoded in d-structure or Merge positions, where agents asymmetrically c-command patients regardless of their surface positions (Guilfoyle et al. 1992). But the interaction with raising shows that neither of those levels is appropriate as the locus of binding theory (Wechsler 1999).<sup>2</sup>

# **3.4 Symmetrical objects**

We have thus far tacitly assumed a total ordering of elements on the ARG-ST list, but Ackerman, Malouf & Moore (2013; 2017) propose a partial ordering for certain so-called *symmetrical object languages*. In Moro (Kordofanian), the two term complements of a ditransitive verb have exactly the same object properties. Relative linear order of the theme and goal arguments is free, as shown by the two translations of (12) (from Ackerman et al. 2017: 9; CL 'noun class'; SM 'subject marker).

(12) é-g-a-natʃ-ó 1SG.SM-CLg-MAIN-give-PFV óráŋ CLg.man ŋeɾá CLŋ.girl (Moro) 'I gave the girl to the man.' / 'I gave the man to the girl.'

More generally, the two objects have identical object properties with respect to occurrence in post-predicate position, case marking, realization by an object marker, and ability to undergo passivization (Ackerman et al. 2017: 9).

<sup>2</sup>To account for (9b) under the configurational binding theory, the subject position must be an A-bar position, but to account for binding by a raised subject, it must be an A-position. See Wechsler (1999).

#### Anthony R. Davis, Jean-Pierre Koenig & Stephen Wechsler

Ackerman et al. (2017) propose that the two objects are unordered on the ARG-ST list. This allows for two different mappings to the COMPS list, as shown here:

(13) a. Goal argument as primary object: SUBJ 1 COMPS 2 , 3 ARG-ST <sup>1</sup> NP , <sup>2</sup> NP , <sup>3</sup> NP CONTENT *give-rel* AGENT *i* GOAL *j* THEME *k* b. Theme argument as primary object: SUBJ 1 COMPS 3 , 2 ARG-ST <sup>1</sup> NP , <sup>2</sup> NP , <sup>3</sup> NP CONTENT *give-rel* AGENT *i* GOAL *j* THEME *k* 

The primary object properties, which are associated with the initial term argument of COMPS, can go with either the goal or theme argument.

To summarize this section, while the relationship between ARG-ST, SUBJ, and COMPS lists was originally conceived as a straightforward one, enabling binding principles to maintain their simple form by defining ARG-ST as the concatenation of the other two, the relationship was soon loosened. Looser relationships between ARG-ST and the valence lists are invoked in accounts of several core syntactic phenomena. Arguments not realized overtly in their canonical positions due to extraction, cliticization, or pro-drop (null anaphora) appear on ARG-ST but not in any valence list. Accounts of syntactic ergativity in HPSG involve variations in the mapping between ARG-ST and valence lists; in particular, the element of SUBJ is not, in such languages, the first element of ARG-ST. Modifications of ARG-ST play a role in some treatments of passivization, where its expected first element is suppressed, and in languages with multiple, symmetric objects, where a partial rather than total ordering of ARG-ST elements has been postulated (see Section 5.3 for details on the analysis of passives in HPSG). Thus ARG-ST has now acquired an autonomous status within HPSG, and is not merely a predictable rearrangement of information present in the valence lists.

9 Argument structure and linking

# **4 Linking: the mapping between semantics and ARG-ST**

# **4.1 HPSG approaches to linking**

The term *linking* refers to the mapping specified in a lexical entry between participant roles in the semantics and their syntactic representations on the ARG-ST list. Early HPSG grammars stipulated the linking of each verb: semantic CON-TENT values with predicator-specific attributes like DEVOURER and DEVOURED were mapped to the subject and object, respectively, of the verb *devour*. But linking follows general patterns across verbs, and across languages; e.g., if one argument of a transitive verb in active voice has an agentive role, it will map to the subject, not the object, except in syntactically ergative languages described in Section 3.3 above, and in those languages the linking is just as regularly reversed. Those early HPSG grammars did not capture the regularities across verbs.

To capture those regularities, HPSG researchers beginning with Wechsler (1995b) and Davis (1996) formulated linking principles stated on more general semantic properties that hold across verbs.

Within the history of linguistics, there have been three general approaches to modeling the lexico-semantic side of linking: thematic role types (Pāṇini ca. 400 B.C., Fillmore 1968); lexical decomposition (Foley & Van Valin 1984; Rappaport Hovav & Levin 1998); and the proto-roles approach (Dowty 1991). In developing linking theories within the HPSG framework, Wechsler (1995b) and Davis (1996) employed a kind of lexical decomposition that also incorporated some elements of the proto-roles approach. The reasons for preferring this over the alternatives are discussed in Section 4.4 below.

Wechsler's (1995b) linking theory constrains the relative order of pairs of arguments on the ARG-ST list according to semantic relations entailed between them. For example, his *notion rule* states that if one participant in an event is entailed to have a mental notion of another, then the first must precede the second on the ARG-ST list. The *conceive-pred* type is defined by the following type declaration (based on Wechsler 1995b: 127, with formal details adjusted for consistency with current usage):

(14) *conceive-pred*:

$$
\begin{bmatrix}
\text{ARG-ST} & \left< \text{NP}\_l , \text{NP}\_j \right> \\
& \begin{bmatrix}
\text{concective-rel} \\
\text{CONCEIVER} \ i \\
\text{CONCEIVED} \ j
\end{bmatrix}
\end{bmatrix}
$$

This accounts for a host of linking facts in verbs as varied as *like*, *enjoy*, *invent*, *claim*, and *murder*, assuming these verbs belong to the type *conceive-pred*.

#### Anthony R. Davis, Jean-Pierre Koenig & Stephen Wechsler

It explains the well-known contrast between experiencer-subject *fear* and experiencer-object *frighten* verbs: *fear* entails that its subject has some notion of its object, so *The tourists feared the lumberjacks* entails that the tourists are aware of the lumberjacks. But the object of *frighten* need not have a notion of its subject: in *The lumberjacks frightened the tourists (by cutting down a large tree that crashed right in front of them)*, the tourists may not be aware of the lumberjacks' existence.

Two other linking rules appear in Wechsler (1995b). One states that "affected themes", that is, participants that are entailed to undergo a change, map to the object, rather than subject, of a transitive verb. Another states that when stative transitive verbs entail a part-whole relation between the two participants, the whole maps to the subject and the part to the object: for example, *X includes Y* and *X contains Y* each entail that *Y* is a part of *X*.

These linking constraints do not rely on a total ordering of thematic roles, nor on an exhaustive assignment of thematic role types to every semantic role in a predicator. Instead, a small set of partial orderings of semantic roles, based on lexical entailments, suffices to account for the linking patterns of a wide range of verbs. This insight was adopted in a slightly different guise in work by Davis (1996), Davis (2001), and Davis & Koenig (2000), who develop a more elaborated representation of lexical semantics, with which simple linking constraints can be stated. The essence of this approach is to posit a small number of dyadic semantic relations such as *act-und-rel* (actor-undergoer relation) with attributes ACT(OR) and UND(ERGOER) that serve as intermediaries between semantic roles and syntactic arguments (akin to the notion of Generalized Semantic Roles discussed in Van Valin 1999).

What are the truth conditions of *act-und-rel*? Following Fillmore (1977), Dowty (1991), and Wechsler (1995b), Davis & Koenig note that many of the pertinent lexical entailments come in related pairs. For instance, one of Dowty's entailments is that one participant causally affects another, and of course the other is entailed to be causally affected. Another involves the entailments in Wechsler's notion rule (14); one participant is entailed to have a notion of another. These entailments of paired participant types characterize classes of verbs (or other predicators), and can then be naturally represented as dyadic relations in CONTENT. Collecting those entailments, we arrive at a disjunctive statement of truth conditions:

(15) *act-und-rel*(*,* ) is true iff causes a change in , or has a notion of , or

We can designate the participant in the pair as the value of ACTOR (or ACT) and as the value of UNDERGOER (or UND), in a relation of type *act-und-rel*. Seman-

…

#### 9 Argument structure and linking

tic arguments that are ACTOR or UNDERGOER will then bear at least one of the entailments characteristic of ACTORs or UNDERGOERs (Davis & Koenig 2000: 72). This then simplifies the statement of linking constraints for all of these paired participant types. Davis (1996) and Koenig & Davis (2001) argue that this obviates counting the relative number of proto-agent and proto-patient entailments, which is what Dowty (1991) had advocated.

The linking constraints (16) and (17) state that a verb whose semantic CON-TENT is of type *act-und-rel* will be constrained to link the ACT participant to the the first element of the verb's ARG-ST list (its subject), and the UND participant to the second element of the verb's ARG-ST list (this is analogous to Wechsler's constraints based on partial orderings). The attribute KEY selects one predication as relevant for linking, among a set of predications included in a lexical item's CONTENT; we furnish more details below.

These linking constraints can be viewed as parts of the definition of lexical types, as in Davis (2001), where each of the constraints in (16)–(18) defines a particular class of lexemes (or words).<sup>3</sup>

$$\begin{array}{l} \text{(16)} \quad \begin{bmatrix} \text{CNTENT} [\text{KEY} \left[ \text{ACT} \left[ \over{\text{l}} \right] \right] \\ \text{ARG-ST} \left\langle \text{NP} \left[ \over{\text{l}} \right] \dots \right\rangle \end{bmatrix} \\\\ \text{(17)} \quad \begin{bmatrix} \text{CNTENT} [\text{KEP} \left[ \overbrace{\text{UND} \left[ \overbrace{\text{L}} \right] \right] \\ \text{ARG-ST} \left\langle \dots \text{ NP} [\overbrace{\text{L}} \right] \dots \end{bmatrix} \end{array}$$

$$\begin{aligned} \text{(18)} \quad \begin{bmatrix} \text{convert} [\text{KEP} \left[ \overbrace{\text{CNOT} \left[ \overbrace{\text{L}} \right] \right] \\ \text{SOA} \left[ \overbrace{\text{ACT} \left[ \overbrace{\text{L}} \right] \right] \end{bmatrix} \end{bmatrix} \end{aligned}$$

$$\begin{aligned} \text{(18)} \quad \begin{bmatrix} \text{CONTENT} [\text{KEP} \left[ \overbrace{\text{E}} \right] \end{bmatrix} \end{aligned}$$

The first constraint, in (16), links the value of ACT (when not embedded within another attribute) to the first element of ARG-ST. The second, in (17), merely links the value of UND (again, when not embedded within another attribute) to some NP on ARG-ST. Given this understanding of how the values of ACT and UND are determined, these constraints cover the linking patterns of a wide range of transitive verbs: *throw* (ACT causes motion of UND), *slice* (ACT causes change of state in UND), *frighten* (ACT causes emotion in UND), *imagine* (ACT has a notion of

 

$$\text{(i)}\quad \left[\text{CNOTENT}|\text{KEY}\left[\text{ACT}\left[\overline{\text{L}}\right]\right]\right] \implies \left[\text{ARG-ST}\left<\text{NP}\_{\overline{\text{L}}}\dots\right>\right] \right]$$

<sup>3</sup>Alternatively, (16) (and other linking constraints) can be recast as implicational constraints on lexemes or words (Koenig & Davis 2003). (i) is an implicational constraint indicating that a word whose semantic content includes an ACTOR role must map that role to the initial item in the ARG-ST list.

#### Anthony R. Davis, Jean-Pierre Koenig & Stephen Wechsler

UND), *traverse* (ACT "measures out" UND as an incremental theme), and *outnumber* (ACT is superior to UND on a scale).

The third constraint, in (18), links the value of an ACT attribute embedded within a SOA (state of affairs) attribute to an NP that is second on ARG-ST. This constraint accounts for the linking of the (primary) object of ditransitives. In English, these verbs (*give*, *hand*, *send*, *earn*, *owe*, etc.) involve (prospective) causing of possession (Pinker 1989; Goldberg 1995), and the possessor is represented as the value of the embedded ACT in (18). There could be additional constraints of a similar form in languages with a wider range of ditransitive constructions; conversely, such a constraint might be absent in languages that lack ditransitives entirely. As mentioned earlier in this section, the range of subcategorization options varies somewhat from one language to another.

The KEY attribute in (16)–(18) also requires further explanation. The formulation of linking constraints here employs the architecture used in Koenig & Davis (2006), in which the semantics represented in CONTENT values is expressed as a set of *elementary predications*, in a way similar to and inspired by Minimal Recursion Semantics (Copestake et al. 2001; 2005). Each elementary predication is a simple relation, but the relationships among them may be left unspecified. For linking, one of the elementary predications is designated the KEY, and it serves as the locus of linking. This allows us to indicate the linking of participants that play multiple roles in the denoted situation. The KEY selects one relation as the "focal point," and the other elementary predications are then irrelevant as far as linking is concerned. The choice of KEY then becomes an issue demanding consideration; we will see in the discussion of argument alternations in Section 5 how this choice might account for some alternation phenomena.

These linking constraints apply to word classes in the lexical hierarchy (see Davis & Koenig 2021, Chapter 4 of this volume). One consequence of this fact merits brief mention. Constraint (17), which links the value of UND to some NP on ARG-ST, is a specification of one class of verbs. Not all verbs (and certainly not all other predicators, such as nominalizations) with a CONTENT value containing an UND value realize it as an NP. Verbs obeying this constraint include the transitive verbs noted above, and intransitive "unaccusative" verbs such as *fall* and *persist*. But some verbs with both ACT and UND attributes in their CONTENT are intransitive, such as *impinge (on)*, *prevail (on)*, and *tinker (with)*. Interactions with other constraints, such as the requirement that verbs (in English, at least) have an NP subject, determine the range of observed linking patterns.

These linking constraints also assume that the proto-role attributes ACTOR, UN-DERGOER, and SOA are appropriately matched to entailments, as described above.

#### 9 Argument structure and linking

Other formulations are possible, such as that of Koenig & Davis (2003), where the participant roles pertinent to each lexical entailment are represented in CONTENT by corresponding, distinct attributes.

In addition to the linking constraints, there may be some very general wellformedness conditions on linking. We rarely find verbs that obligatorily map one semantic role to two distinct members of the ARG-ST list, both expressed overtly. A verb meaning 'eat', but with that disallowed property, could appear in a ditransitive sentence like (19), with the meaning that Pat ate dinner, and his dinner was a large steak.

(19) \* Pat ate dinner a large steak.

Typically, semantic arguments map to at most one (overtly expressed) ARG-ST list item (Davis 2001: 262–268).

Having set out some general principles of linking and their implementation in HPSG, we now briefly discuss linking of oblique arguments. We also return in the remainder of this section to issues relating to lexical semantic representations as they pertain to linking. To what extent are the elements of ARG-ST determined by lexical semantics? Do HPSG lexical semantic representations require thematic roles? And how does other information in these representations, such as modality and modifier scope, affect linking?

# **4.2 Linking oblique arguments**

In this section we discuss linking of oblique arguments, that is, PPs and oblique case marked NPs. In some instances, a verb's selection of a particular preposition appears at least partly arbitrary; it is hard to explain why English speakers accept *hanker after* and *yearn for*, but not \**yearn after*. In these cases, the choice of preposition may be stipulated by the individual lexical entry. But as Gawron (1986) and Wechsler (1995a) have shown, many prepositions selected by a verb have semantic content. *For* in the above-mentioned cases, and in *look for*, *wait for*, and *aim for*, is surely not a lexical accident. And in cases like *cut with*, *with* is used in an instrumental sense, denoting a *use-rel* relation, as with verbs that either allow (*eat*) or require (*cut*) an instrument (Koenig & Davis 2006). Davis (1996; 2001) adopts the position of Gawron and Wechsler in his treatment of linking to PPs. As an example of this kind of account, the linking type in (20) characterizes a verb selecting a *with*-PP. The PP argument is linked from the RELS list rather than the KEY.

Anthony R. Davis, Jean-Pierre Koenig & Stephen Wechsler

$$\begin{array}{c} \left[ \begin{array}{c} \text{\(\text{KEF}\)} \\ \text{\(\text{CONTENT}\)} \\ \text{\(\text{RELS}\left\{\begin{array}{l} \text{\(\Box\)} \ \box[\text{\(\Box\)} \\ \text{\(\Box\)} \end{array} \end{array} \left[ \begin{array}{l} \text{use-rel} \\ \text{Act } a \\ \text{UND } u \\ \text{SOA} \end{array} \right] \end{array} \right] \right] \\ \left[ \begin{array}{l} \text{\(\Box\)} \ \box[\text{\(\Box\)} \ \box[\text{\(\Box\)} \\ \text{UND } u \\ \text{\(\Box\)} \end{array} \right] \end{array} \right] \end{array} \tag{20}$$

Apart from the details of individual linking constraints, we have endeavored here to describe how linking can be modeled in HPSG using the same kinds of constraints used ubiquitously in the framework. Within the hierarchical lexicon (see Davis & Koenig 2021, Chapter 4 of this volume), constraints between semantically defined classes and syntactically defined ones can furnish an account of linking patterns, and there is no resort to additional mechanisms such as a thematic hierarchy or numerical comparison of entailments.

# **4.3 To what extent does meaning predict linking?**

The framework outlined above allows us to address the following question: how much of linking is strictly determined by semantic factors, and how much is left open to lexically arbitrary subcategorization specifications, or perhaps subject to other factors?

Subcategorization – the position and nature of ARG-ST elements, in HPSG terms – is evidently driven to a great extent by semantics, but debate continues about how much, and which components of semantics are involved. Views have ranged from the strict, highly constrained relationship in which lexical semantics essentially determines syntactic argument structure to a looser one in which some elements of subcategorization may be stipulated. Among the first camp are those who espouse the Uniformity of Theta Assignment Hypothesis proposed in Baker (1988: 46), which maintains that "identical thematic relationships between items are represented by identical structural relationships" in the syntax (see also Baker 1997). With regard to the source of diathesis alternations, Levin (1993: 12–13) notes that "studies of these properties suggest that argument structures might in turn be derivable to a large extent from the meaning of words", and accordingly "pursues the hypothesis of semantic determinism seriously to see just how far it can be taken".

Others, including Pollard & Sag (1987: Section 5.3) and Davis (2001: Section 5.1), have expressed caution, pointing out cases where subcategorization and diathesis alternations seem to be at least partly arbitrary. Pollard & Sag (1987: ex. 214– 215) note contrasts like these:

9 Argument structure and linking

	- b. Sandy \*spared/deprived Kim of a second helping.

And Davis (2001: ex. 5.4) provides these pairs of semantically similar verbs with differing subcategorization requirements:

	- b. Homer opted for/chose a chocolate frosted donut.
	- c. The music grated on/irritated the critics.

Other cases where argument structure seems not to mirror semantics precisely include raising constructions, in which one of a verb's direct arguments bears no semantic role to it at all. Similarly, overt expletive arguments cannot be seen as deriving from some participant role in a predicator's semantics. Like the examples above, these phenomena suggest that some aspects of subcategorization are specified independently of semantics.

Another point against strict semantic determination of argument structure comes from cross-linguistic observations of subcategorization possibilities. It is evident, for example, that not all languages display the same range of direct argument mappings. Some lack ditransitive constructions entirely (Halkomelem), some allow them across a limited semantic range (English), some quite generally (Georgian), and a few permit tritransitives (Kinyarwanda and Moro). Gerdts (1992) surveys about twenty languages and describes consistent patterns like these. The range of phenomena such as causative and applicative formation in a language is constrained by what she terms its "relational profile"; this includes, in HPSG terms, the number of direct NP arguments permitted on its ARG-ST lists. Again, it is unclear that underlying semantic differences across languages in the semantics of verbs meaning *give* or *write* would be responsible for these general patterns.

# **4.4 HPSG and thematic roles**

The ARG-ST list constitutes the syntactic side of the mapping between semantic roles and syntactic dependents. As ARG-ST is merely an ordered list of arguments, without any semantic "labels", it contains no counterparts to thematic role types, such as AGENT, PATIENT, THEME, or GOAL. Thematic roles like these, however, have been a mainstay of linking in Generative Grammar since Fillmore (1968) and have antecedents going back to Pāṇini. Ranking them in a *thematic hierarchy*, and labeling each of a predicator's semantic roles (e.g., EATER and FOOD for

#### Anthony R. Davis, Jean-Pierre Koenig & Stephen Wechsler

the verb *eat*) with a unique thematic role (e.g., AGENT and PATIENT for *eat*), then yields an ordering of roles analogous to the ordering on the ARG-ST list. Indeed, it would not be difficult to import this kind of system into HPSG, as a means of determining the order of elements on the ARG-ST list. However, HPSG researchers have generally avoided using a thematic hierarchy, for reasons we now briefly set out.

Fillmore (1968) and many others thereafter have posited a small set of disjoint thematic roles, with each of a predicator's participant roles assigned exactly one thematic role. Thematic hierarchies depend on these properties for a consistent linking theory, but they do not hold up well to formal scrutiny. Jackendoff (1987) and Dowty (1991) note (from somewhat different perspectives) that numerous verbs have arguments not easily assigned a thematic role from the typically posited inventory (e.g., the objects of *risk*, *blame*, and *avoid*), that more than one argument might sensibly be assigned the same role (e.g., the subjects and objects of *resemble*, *border*, and some alternants of commercial transaction verbs), and that multiple roles can be sensibly assigned to a single argument (the subjects of verbs of volitional motion such as *jump* or *flee* are both an AGENT and a THEME). In addition, consensus on the inventory of thematic roles has proven elusive, and some, notoriously THEME, have resisted clear definition. Work in formal semantics, including Ladusaw & Dowty (1988), Dowty (1989), Landman (2000), and Schein (2002), casts doubt on the prospects of assigning formally defined thematic roles to all of a predicator's arguments, at least in a manner that would allow them to play a crucial part in linking. Thematic role types seem to pose problems, and there are alternatives that avoid those problems. As Carlson (1998: 35) notes about thematic roles: "It is easy to conceive of how to write a lexicon, a syntax, a morphology, a semantics, or a pragmatics without them". The three attributes ACT, UND, and SOA can capture most of what needs to be stated about linking direct arguments within HPSG. There is thus no need to posit a more extensive range of thematic roles. Moreover, because the same participant can be referenced by more than one of these attributes, it is simple to distinguish within lexical representations between, e.g., caused volitional motion or change of state (as in *jump* or *dress*), in which the values of ACT and UND are identical, and "unaccusative" verbs (such as *fall* or *vanish*), which lack an ACT in their CONTENT. In the following sections, we will see additional examples of these attributes in more complex lexical semantic representations.

# **4.5 CONTENT decomposition and ARG-ST**

Instead of thematic role types, lexical decomposition is typically used in HPSG to model the semantic side of the linking relation. The word meaning represented

#### 9 Argument structure and linking

by the CONTENT value is decomposed into elementary predications that share arguments, as described in Section 4.1 above and Section 5 below. Lexical decompositions cannot be directly observed, but the decompositions are justified indirectly by the roles they play in the grammar. Decompositions play a role in at least the following processes:


Consider sentence (23).

(23) John sold the car, and then he bought it again.

In this sentence, the adverb *again* either adds the presupposition that John bought it before, or, in the more probable interpretation, it adds the presupposition that *the result of buying the car* obtained previously. The result of buying a car is owning it, so this sentence presupposes that John previously owned the car. Thus the decomposition of the verb *buy* includes a *possess-rel* (possession relation) holding between the buyer and the goods. This is available for modification by adverbials like *again*.

• *Argument alternations.* Some argument alternations can be modeled as the highlighting of different portions of a single lexical decomposition. See Section 5.

In general, sublexical decompositions are included in the CONTENT value only insofar as they are visible to the grammar for processes like these.

The ARG-ST list lies on the syntax side of linking. Just as the roles and predicates within CONTENT must be motivated by (linguistic) semantic considerations, the presence of elements on ARG-ST is primarily motivated by their syntactic visibility. Many ARG-ST list items are obviously justified, being explicitly expressed as subject and complement phrases, or as affixal pronouns. In addition, certain implicit arguments should appear on ARG-ST if, for instance, they are subject to the binding theory constraints that apply to ARG-ST, as discussed in Section 3.1 above.

Some implicit arguments can also participate in the syntax, for example, by acting as controllers of adjunct clauses. This could plausibly be viewed as evidence

#### Anthony R. Davis, Jean-Pierre Koenig & Stephen Wechsler

that such arguments are present on the ARG-ST list. English rationale clauses, like the infinitival phrase in (24a), are controlled by the agent argument in the clause, *the hunter* in this example. The implicit agent of a short passive can likewise control the rationale clause as shown in (24b). But control is not possible in the middle construction (24c) even though loading a gun requires some agent. This contrast was observed by Keyser & Roeper (1984) and confirmed in experimental work by Mauner & Koenig (2000).

	- b. The shotgun was loaded quietly to avoid the possibility of frightening off the deer.
	- c. \* The shotgun had loaded quietly to avoid the possibility of frightening off the deer.

If the syntax of control is specified such that the controller of the rationale clause is an (agent) argument on the ARG-ST list of the verb, then this contrast is captured by assuming that the agent appears on the ARG-ST list of the passive verb but not the middle.

# **4.6 Modal transparency**

Another observation concerning lexical entailments and linking was developed by Koenig & Davis (2001), who point out that linking appears to ignore modal elements of lexical semantics, even when those elements invalidate entailments (expanding on an observation implicit in Goldberg 1995). For instance, there are various English verbs that display linking patterns like the ditransitive verbs of possession transfer *give* and *hand*, but which denote situations in which the transfer need not, or does not, take place. Consider (25). Thus, *offer* describes a situation where the transferor is willing to effect the transfer, *owe* one in which the transferor should effect the transfer but has not yet, *promise* describes a situation where the transferor commits to effect the transfer, and *deny* one in which the transferor does not effect the contemplated transfer.

(25) Marge offered/owed/promised/denied Homer a chocolate donut.

Koenig & Davis argue that modal elements should be clearly separated in CON-TENT values from the representations of predicators and their arguments. (26) exemplifies this factoring out of sublexical modal information from core situational information.

#### 9 Argument structure and linking

(26) The lexical semantic representation of *promise* (Koenig & Davis 2001: 101):

 *promise-sem* ∧ *cause-possess-sem* SIT-CORE 1 *cause-possess-rel* ACT *a* UND 2 SOA SIT-CORE *have-rel* ACT 2 UND *u* MODAL-BASE *deontic-mb* <sup>∧</sup> *condit-satis-mb* SOA 1 

This pattern of linking functioning independently of sublexical modal information applies not only to these ditransitive cases, but also to verbs involving possession (cf. *own* and *obtain* vs. *lack*, *covet*, and *lose*), perception (*see* vs. *ignore* and *overlook*), and carrying out an action (*manage* vs. *fail* and *try*). Whatever the role of lexical entailments in linking, then, the modal components should be factored out, since the entailments that determine, e.g., the ditransitive linking patterns of verbs like *give* and *hand* do not hold for *offer*, *owe*, or *deny*, which display the same linking patterns. The constraints in (16)–(18) need only be minimally altered to target the value of SIT-CORE, representing the "situational core" of a relation.

This kind of semantic decomposition preserves the simplicity of linking constraints, while representing the differences between verbs that straightforwardly entail the relation between the arguments in the situational core and verbs for which those entailments do not hold, because their meaning contains a modal component restricting those entailments to a subset of possible worlds.

# **4.7 Summary of linking**

In this section we have examined HPSG approaches to linking. HPSG constrains the mapping between participant roles in CONTENT and their syntactic representation on ARG-ST based on entailments of the semantic relations in CONTENT. These constraints do not require a set of thematic roles arranged in a hierarchy. Nor do they require a numerical comparison of entailments holding for each participant role, which has been an influential alternative to a thematic hierarchy. Rather, they reference the types of relations within a lexical entry's CONTENT, and the subcategorization requirements of its ARG-ST. Information from both is necessary because, although semantics is a strong determinant of argument realization, independent stipulations of subcategorization appear to be needed, too.

#### Anthony R. Davis, Jean-Pierre Koenig & Stephen Wechsler

Finally, we have examined the role of modal information in lexical semantics, which seem not to interact much with linking, and described mechanisms proposed within HPSG that separate this information from the situational core that drives linking.

In the remainder of this chapter, we will examine the relationship of argument structure to argument alternations, including passives, as well as broader questions concerning the addition of other elements like modifiers to ARG-ST, the universality of ARG-ST across languages, and whether ARG-ST is best regarded as solely a lexical attribute or one that should also apply to phrases or constructions.

# **5 The semantics and linking of argument alternations**

A single verb can often alternate between various alternative patterns of dependent phrases, a situation called either *argument alternations*, *valence alternations*, or *diathesis alternations*. Levin (1993) lists around 50 kinds of alternations in English, and English is not untypical in this regard.

How has argument structure in HPSG been used to account for alternations? Many alternations exhibit (often subtle) meaning differences between the two alternants. We first discuss alternations due to these differences in meaning, showing how their differing ARG-ST lists arise from differences in CONTENT. We then examine some alternations where meaning differences are less apparent. Although the CONTENT values of the two alternants in such cases may not differ, we can analyze the alternation in terms of a different choice of KEY predicate in each. Lastly, we consider active-passive voice alternations, which are distinct from other alternations in important ways.

# **5.1 Meaning-based argument alternations**

One well-studied alternation, the locative alternation, is exemplified by the two uses of *spray* in (27).

	- b. *spraywith*: Joan sprayed the statue with paint.

It is typically assumed that these two different uses of *spray* in (27) have slightly different meanings, with the statue being in some sense more affected in the *with* alternant. This exemplifies the "holistic" effect of direct objecthood, which we will return to. Here, we will examine how semantic differences between alternants relate to their linking patterns. The semantic side of linking has often been

#### 9 Argument structure and linking

devised with an eye to syntax (e.g., Pinker 1989, and see Koenig & Davis 2006 for more examples). There is a risk of stipulation here, without independent evidence for these semantic differences. In the case of locative alternations, though, the meaning difference between (27a) and (27b) is easily stated (and Pinker's intuition seems correct), as (27b) entails (27a), but not conversely. Informally, (27a) describes a particular kind of caused motion situation, while (27b) describes a situation in which this kind of caused motion additionally results in a caused change of state. The difference is depicted in the two structures in (28).

	- b. ACT-ON (JOAN, STATUE, BY (CAUSE (JOAN, GO (PAINT, TO (STATUE)))))

This description of the semantic difference between sentences (27a) and (27b) provides a strong basis for predicting their different argument structures. But we still need to explain how linking principles give rise to this difference. Pinker's account rests on semantic structures like (28), in which depth of embedding reflects sequence of causation, with ordering on ARG-ST stemming from depth of semantic embedding, a strategy adopted in Davis (1996) and Davis (2001). This is one reasonable alternative, although the resulting complexity of some of the semantic representations raises valid questions about what independent evidence supports them. An alternative appears in Koenig & Davis (2006), who borrow from Minimal Recursion Semantics (see Koenig & Richter 2021: Section 6.1, Chapter 22 of this volume for an introduction to MRS). MRS "flattens" semantic relations, rather than embedding them in one another, so the configuration of these *elementary predications* with respect to one another is of less import. It posits a RELATIONS (or RELS) attribute that collects a set of elementary predications, each representing some part of the predicator's semantics. In Koenig & Davis' analysis, a KEY attribute specifies a particular member of RELS as the relevant one for linking (of direct syntactic arguments). In the case of (27b), the KEY is the caused change of state description. These MRS-style representations of the two alternants of *spray*, with different KEY values, are shown in (29) and (30).

$$\begin{array}{c} \begin{bmatrix} \text{KEF} & \boxed{\Xi} \\\\ \begin{array}{c} \\ \text{RELS} \left\langle \begin{array}{l} \text{s}gray-ch-of-loc-rel \\ \text{Act} \begin{array}{l} \boxed{\Xi} \\ \text{UND} \left[ \boxed{\Xi} \end{array} \\ \end{array} \end{array} \right\rangle \\\hline \end{array} \end{array} \left[ \begin{array}{l} \text{s}gray-ch-of-loc-rel} \\ \begin{bmatrix} \text{KCT} & \boxed{\Xi} \\ \text{UND} & \boxed{\Xi} \\ \text{SOA} & \boxed{\text{FIGRE}} \left\langle \begin{array}{l} \\ \text{FIGRE} \left[ \boxed{\Xi} \end{array} \end{bmatrix} \right\rangle \\\end{array} \right] \right] \end{array} \right] \tag{29}$$

#### Anthony R. Davis, Jean-Pierre Koenig & Stephen Wechsler

(30) KEY 3 *spray-ch-of-st-rel* ACT 1 UND 2 SOA *ch-of-st-rel* UND 2 RELS \* 3 , *use-rel* ACT 1 UND 4 SOA 3 , *spray-ch-of-loc-rel* ACT 1 UND 4 SOA *ch-of-loc-rel* FIGURE 4 + 

Generalizing from this example, one possible characterization of valence alternations, implicit in Koenig & Davis (2006), is as systematic relations between two sets of lexical entries in which the RELS of any pair of related entries are in a subset/superset relation (a weaker version of that definition would merely require an overlap between the RELS values of the two entries). Consider another case; (31) illustrates the causative-inchoative alternation, where the intransitive alternant describes only the change of state, while the transitive one ascribes an explicit causing agent.

	- b. The window broke.

Under an MRS representation, the change of state relation is a separate member of RELS; it is also included in the RELS of the transitive alternant, which contains a cause relation as well. Again, the RELS value of one member of each pair of related entries is a subset of the RELS value of the other.

Many other alternations involve one argument shifting from direct to oblique. Some English examples include conative, locative preposition drop, and *with* preposition drop alternations, as shown in (32):

	- b. Bill hiked (along/on) the Appalachian Trail.
	- c. Burns debated (with) Smithers.

The direct object argument in (32a) is interpreted as more "affected" than its oblique counterparts: if Rover clawed Spot, we infer that Spot was subjected to direct contact with Rover's claws and may have been injured by them, while if Rover merely clawed *at* Spot, no such inference can be made. Similarly, to

#### 9 Argument structure and linking

say that one has hiked the Appalachian Trail as in the transitive variant of (32b) suggests that one has hiked its entire length, while the prepositional variants merely suggest one hiked along some portion of it. In still other cases like (32c), the two variants seem to differ very little in meaning.

Beavers (2010) observes the following generalization over direct–oblique alternations: the direct variant entails the oblique one, and can have an additional entailment that the oblique variant lacks. His *Morphosyntactic Alignment Principle* (MAP) states this generalization in terms of "L-thematic roles", which are defined as sets of entailments associated with individual thematic roles:

(33) When participant may be realized as either a direct or oblique argument of verb V, it bears L-thematic role as a direct argument and L-thematic role ⊆ as an oblique. (Beavers 2010: 848)

Here, and are roles, defined as sets of individual entailments, and ⊆ means that set is a subset of that is minimally different from , differing in at most one entailment. Thus, the substantive claim is essentially that the MAP rules out "verbs where the alternating participant has MORE lexical entailments as an oblique than the corresponding object realization" (Beavers 2010: 849). The notion of a stronger role in Beavers' analysis has a rough analogue in terms of whether a particular elementary predication is present in the semantics of a particular alternant. Beavers (2005) describes a version of the Morphosyntactic Alignment Principle implemented in HPSG, which posits a separate ROLES attribute within CONTENT, containing a list of labeled roles. The roles are ordered on the ROLES list, determined at least partly by direction of causality, although this is not fully worked out. Each role can be regarded as a bundle of entailments. The bundle of entailments varies slightly between different alternants of verbs like those in (32), and the Morphosyntactic Alignment Principle comes into play, comparing the sets of entailments constituting each role. Assessing which of two roles is stronger, according to this principle, requires some additional mechanisms within HPSG that are not spelled out.

Beavers notes the resemblance between his account and numerical comparison approaches such as those of Dowty (1991) and Ackerman & Moore (2001). He points out that the direct object bears an additional entailment in each alternant. However, the specific entailment involved depends on the verb; the entailments involved in each of the examples in (32), for instance, are all different. Thus, comparing the numbers of entailments holding for a verb's arguments in each alternant is crucial.

#### Anthony R. Davis, Jean-Pierre Koenig & Stephen Wechsler

# **5.2 Relationships between alternants**

Having outlined the semantic basis of the different linking patterns of alternating verbs, we briefly take up two other issues. First is the question of how the alternants are related to one another. Second is how KEY selection has been used to account not just for alternants of the same verb, but for (nearly) synonymous verbs whose semantics contain the same set of elementary predications.

The hypothesis pursued in Davis (1996) and Davis (2001) is that most alternations are the consequence of classes of lexical entries having two related meanings. This follows researchers such as Pinker (1989) and Levin (1993) in modeling subcategorization alternations as underlyingly meaning alternations. This change in meaning is crucial to the Koenig & Davis (2006) KEY shifts as well. In some cases, the value of the RELS attribute of the two valence alternates differ (as in the two alternates of *spray* in the so-called *spray/load* alternation we discussed earlier). In some cases, the alternation might be different construals of the same event for some verbs, but not others, as Rappaport Hovav & Levin (2008) claim for the English ditransitive alternation, which adds the meaning of transfer for verbs like *send*, but not for verbs like *promise*; a KEY change would be involved (with the addition of a *cause-possess-rel*) for the first verb only. But KEY shifts and diathesis alternations do not always involve a change in meaning. The same elementary predications can be present in the CONTENT values of two alternants, with each alternant designating a different elementary predication as the KEY.

Koenig & Davis propose this not only for cases in which there is no obvious meaning difference between two alternants of a single verb, but also for different verbs that appear to be truth-conditionally equivalent. The verbs *substitute* and *replace* are one such pair. The two sentences in (34) illustrate this equivalence.

	- b. They replaced the burnt-out incandescent bulb with an LED.

These two verbs denote a type of event in which a new entity takes the place of an old one, through (typically intentional) causal action. Koenig & Davis decompose both verb meanings into two simpler actions of removal and placement: 'x removes y (from g)' and 'x places z (at g)', each represented as an elementary predication in the CONTENT values of these verbs. In the following two structures, adapted from their work, the *location-rel* predication represents an entity being in a location, with the value of FIG denoting the entity and the value of GRND its location.

#### 9 Argument structure and linking

(36) Representation of 'x removes z (from g)'

Either one can be selected as the KEY. In the lexical entry of *replace*, the removal predication is the value of KEY, while in the lexical entry of *substitute*, the placement of the new object is the value of KEY. (37) and (38) show the CONTENT values of these two verbs under this account, where and abbreviate the structures in (35) and (36). In both cases, the same linking constraints apply between the KEY and the ARG-ST list, but the two verbs have different argument realizations because their KEY values differ, even though their semantics are equivalent.

 


As a final example of the effect of alternations on fine-grained aspects of verb meaning, we consider the source-final product alternation exemplified in (39),

#### Anthony R. Davis, Jean-Pierre Koenig & Stephen Wechsler

where the direct object can be either the final product or the material source of the final product.

	- b. Kim made/carved/sculpted/crafted the wood into a toy.

Davis proposes that the (39a) sentences involve an alternation between the two meanings represented in (40), each associated with a distinct entry. We adapt Davis (2001) to make it consistent with Koenig & Davis (2006) and also treat the alternation as an alternation of *entries* with distinct meanings. Lexical rules are a frequent analytical tool used to model alternations between two related meanings of a single entry illustrated in (40). One of the potential drawbacks of a lexical rule approach to valence alternations is that it requires selecting one alternant as basic and the other as derived. This is not always an easy decision, as Goldberg (1991: 731–732) or Levin & Rappaport Hovav (1994) have pointed out (e.g., is the inchoative or the causative basic?). Sometimes, morphology provides a clue, although in different languages the clues may point in different directions. French, and other Romance languages, use a "reflexive" clitic as a detransitivizing affix. In English, though, there is no obvious "basic" form or directionality. It is to avoid committing ourselves to a directionality in the relation between the semantic contents described in (40) that we eschew treating it as a lexical rule. (Identically numbered tags in (40a) and (40b) indicate structure-sharing and labels such as *final-product* and *source material* are informal and added for clarity.)

# **5.3 The problem of passives**

Although most diathesis alternations can be modeled as alternations in meaning or as KEY shifts, some arguably cannot. One prominent example is the ac-

#### 9 Argument structure and linking

tive/passive alternation. Other widely attested constructions, such as raising constructions, similarly involve no change in meaning, but we will examine only passives here.

The semantics of actives and corresponding long passives, as in (41), are practically identical and the difference between the two alternants is pragmatic in nature.

	- b. A couple of holes were dug by Fido.

In this section, we outline two possible approaches to the passive. Both of them treat the crucial characteristic of passivization as subject demotion (see Blevins 2003 for a thorough exposition of this characterization), rather than object advancement, as proposed, e.g., in Relational Grammar (Perlmutter & Postal 1983). As we will see, there are various options for implementing this general idea of demotion within HPSG.

The first approach, which goes back to Pollard & Sag (1987: 215), assumes that passivization targets the first member of a SUBCAT list and either removes it or optionally puts it last on the list, but as a PP. This approach is illustrated in (42), a possible formulation of a lexical rule for transitive verbs adapted to a theory that replaces SUBCAT with ARG-ST, as discussed in Manning et al. (1999: 67). See Müller (2003) for a more refined formulation of the passive lexical rule for German that accounts for impersonal passives, and Blevins (2003) for a similar analysis. The first NP is demoted and either does not appear on the output's ARG-ST or is a PP coindexed with the input's first NP's index.

Linking in passives thus violates the constraints in (16)–(18), specifically (16), which links the value of ACT to the first element of ARG-ST. We use one possible feature-based representation for lexical rules to help comparing approaches to passives. See Meurers (2001) and Davis & Koenig (2021), Chapter 4 of this volume for a discussion of various approaches to lexical rules. Note that we use the attribute LEX-DTR rather than the IN(PUT) attribute used in the representation of lexical rules in Meurers (2001: 76), as, like him, we wish to avoid any procedural implications; nothing substantial hinges on this labeling change.

(42) Passive lexical rule:

 *passive-verb* ARG-ST 1 ⊕ PP[*by*] LEX-DTR *stem* HEAD *verb* ARG-ST NP ⊕ 1 NP, … 

#### Anthony R. Davis, Jean-Pierre Koenig & Stephen Wechsler

We will refer to this approach as the non-canonical linking analysis of passives. This kind of analysis invites at least three questions. First, as already noted, the constraint linking the value of ACT to the first element of ARG-ST is violated. If passives — widespread and hardly exotic constructions — violate canonical linking constraints, how strong an account of linking can be maintained? Second, what other predictions, such as changes in binding behavior, control constructions, and discourse availability, arise from the altered ARG-ST of passives? Third, what is the status of the *by*-phrase in long passives, and how is it represented on the ARG-ST list?

Another approach maintains the ARG-ST list of the active verb in its passive counterpart, thereby preserving linking constraints. Passives differ from actives under this account in their non-canonical mapping from ARG-ST to valence lists; the subject is not the first element of the ARG-ST list. This analysis bears some resemblance to the distinction between macro-roles and syntactic pivots in Role and Reference Grammar, with passives having a marked mapping from macroroles to syntactic pivot (Van Valin & LaPolla 1997). In this kind of approach, the passive subject might be the second element of the ARG-ST list, as in a typical personal passive, or an expletive element, as in impersonal passives. In a long passive, the first element of ARG-ST is coindexed with a PP on the COMPS list or an adjunct. This analysis is reminiscent of the account of Balinese objective voice presented in Section 3.3 in that the account of both phenomena uses a noncanonical mapping between ARG-ST and valence lists. A version of this view is proposed by Davis (2001: 246), who suggests the representation in (43) for passive lexemes (as before, we substitute the attribute name LEX-DTR for IN).

(43) Passive lexical rule:

$$\begin{array}{|l|l|}
\hline
\text{passive-verb} \\
\text{SUBJ} & \text{[1]} \\
\text{COMPS} & \text{[2]} \\
\text{ARG-ST} & \text{[3]} \left( \left( \text{XP} \right) \right) \oplus \left[ \text{[1]} \oplus \text{[2]} \right] \\
\text{LEX-DTR} & \begin{bmatrix} \text{trans-stem} \\ \text{ARG-ST} & \text{[3]} \\ \text{CONTENT} & \text{[4]} \end{bmatrix} \\
\text{CONTENT} & \text{[4]} \\
\hline
\end{array}$$

We will refer to this as the non-canonical argument realization analysis of passives. Again, at least three issues must be addressed. First, the standard mapping between the elements of ARG-ST and those of the valence lists is violated. If passives violate these canonical mapping constraints, how strong an account of the relationship between ARG-ST and valence can be maintained? Second, as with

#### 9 Argument structure and linking

the non-canonical linking analysis, what predictions, such as changes in binding behavior, control constructions, and discourse availability, arise from the noncanonical valence values in passives? Third, what is the status of the *by*-phrase in long passives, and how is it represented on the ARG-ST list? If the logical subject remains the first element of a passive verb's ARG-ST list, does it appear as an additional oblique element on ARG-ST as well?

The implications of weakening canonical constraints under each of these analyses have not been thoroughly addressed, to our knowledge. We are unaware, for example, of proposals that limit non-canonical linking in HPSG to only the kind observed in passives. One might begin by stipulating that linking concerns only NP (i.e., "direct") arguments on ARG-ST, but the implications of this have not yet been well explored. Another possibility is to limit where linking constraints can apply. They might be restricted to apply to words whose MORPH or BASE feature values are of type *lexeme* not *word* (Runner & Aranovich 2003: 362) or they might apply to basic (underived) lexemes only, but not to their derived passive forms (this is the strategy adopted by the CoreGram project, Müller 2015). Difficulties might arise in the latter case, however, in cases where derived forms add directly linked arguments, such as morphological causatives and applicatives. Müller (p. c. 2021) therefore assumes that when a lexical rule alters semantics – as in causatives and applicatives, but not passives – linking constraints must apply to its output. With respect to the non-canonical argument realization analysis, the required variation in ARG-ST to valence mappings has been investigated somewhat more (see Section 3 for some details), especially in connection with ergativity and voice alternations, and also in analyses of pro-drop, cliticization, and extraction (Miller & Sag 1997; Manning et al. 1999; Bouma et al. 2001). Thus, there are some independent motivations for positing non-canonical mappings between ARG-ST and valence lists. But we will leave matters here in regard to the general advantages and drawbacks of non-canonical linking versus non-canonical argument realization.

As for passives in particular, the two analyses make different predictions regarding binding and control by the "logical subject" (the subject of the corresponding actives). Under the non-canonical linking analysis, it is not present on the ARG-ST (and valence) lists of short passives, so it is predicted to be unavailable to any syntactic process that depends on elements of ARG-ST. Binding and varieties of control that reference these elements therefore cannot involve the logical subject. Under the non-canonical argument realization analysis, the logical subject is present on the ARG-ST lists of short passives, so it is predicted to play much the same role in binding as it does in corresponding actives. However, we can see that, at least when unexpressed, this is not the case, as in (44).

#### Anthony R. Davis, Jean-Pierre Koenig & Stephen Wechsler

(44) \* The money was sent to himself. (himself intended to refer to the sender)

Certain control constructions also illustrate this point. While the unexpressed logical subject can control rationale clauses in English, as exemplified above in (24b), not all cases of control exhibit parallel behavior. The Italian consecutive *da* + infinitive construction (Perlmutter 1984; Sanfilippo 1998) appears to be controlled by the surface subject, as shown in (45).

	- b. Eva Eva fu was rimproverata scolded da by Gino Gino tante so.many volte times da so.as arrabbiarsi. to.get.angry 'Eva was scolded by Gino so many times that \*he/she got angry.'

Although there are other factors involved in the choice of controller of consecutive *da* infinitive constructions, it is clear that the logical subject in the passivized main clause cannot control the infinitive. Thus, even if it remains the initial element of the passive verb's ARG-ST, it must be blocked as a controller. Sanfilippo argues from these kinds of examples that the passive *by*-phrase should be regarded as a "thematically bound" (i.e., linked) adjunct that does not appear on the passive verb's ARG-ST list, but on the SLASH list. However, this would require some additional mechanism to explain the involvement of the *by*-phrase in binding, noted below, and possibly with respect to other evidence for including adjuncts on ARG-ST, as discussed in Section 6.

In addition, the implicit agent of short passives is "inert" in discourse, as discussed in Koenig & Mauner (1999). It cannot serve as an antecedent of crosssentential pronouns without additional inferences, as shown in (46), where the referent of *he* cannot without additional inference be tied to the logical subject argument of *killed*, i.e., the killer.

(46) # The president was killed. He was from Iowa.

Note that the discourse inertness of the implicit agent in (46) does not follow from its being unexpressed, as shown by the indefinite use of the subject pronoun *on* in French (Koenig 1999: 241–244) or Hungarian bare singular objects (Farkas & de Swart 2003: 89–108). These, though syntactically expressed, do not introduce discourse referents either. In such cases, as well as in passives under the noncanonical argument realization analysis, the first member of the ARG-ST list must therefore be distinguished from indices that introduce discourse referents.

#### 9 Argument structure and linking

These facts would seem to favor the non-canonical linking analysis. However, there are options for representing the inertness of the logical subject under the non-canonical argument realization analysis. One possibility is to introduce a special subtype of the type *index*, which we could call *inert* or *null*; by stipulation, it could not correspond to a discourse referent. This is also one way to treat the inertness of expletive pronouns, so it has some plausible independent motivation. Unlike expletive pronouns, the logical subject of passives is linked to an index in CONTENT. Its person and number features therefore cannot be assigned as defaults (e.g., third person singular *it* and *there* in English), but must correspond to those of the entity playing the relevant semantic role in CONTENT. Davis (2001: 251–253) offers a slightly different alternative using the dual indices INDEX and A-INDEX, following the distinction between AGR and INDEX used in Kathol (1999: 240–250) to model different varieties of agreement. The A-INDEX of a passive verb's logical subject is of type *null*, which, by stipulation, can neither o-command other members of ARG-ST nor appear on valence lists. In both impersonal and short personal passives, the logical subject is coindexed with a role in CONTENT representing an unspecified human (or animate). These analyses of logical subject inertness have not been pursued, however.

Finally, we turn to *by*-phrases in long passives. In languages like English, a *by*phrase can express the lexeme's logical subject. Under both the non-canonical linking and non-canonical argument realization analyses, this might be represented as an optional oblique complement on ARG-ST, as indicated in (42) and (43), respectively. As noted, the non-canonical argument realization analysis would then posit that the ARG-ST of passives includes two members that correspond to the same argument, which again shows the need for an inert first element of ARG-ST. Another possibility is to treat such *by*-phrases as adjuncts (and therefore not part of the ARG-ST list), see Höhle (1978: Chapter 7) and Müller (2003: 292–294) for German and Jackendoff (1990: 180) for English. There is evidence, however, that *by*-phrases can serve as antecedents of anaphors in at least some languages. Collins (2005: 111) cites sentences like (47), which suggest that the complement of *by*-phrases can bind a reciprocal.

(47) The packages were sent by the children to each other.

Acceptability judgements of this and similar examples vary, but they are certainly not outright unacceptable. Likewise, Perlmutter (1984: 10) furnishes Russian examples in which the logical subject (realized as an instrumental case NP) binds a reflexive (note that the English translation of it is also fairly acceptable).

#### Anthony R. Davis, Jean-Pierre Koenig & Stephen Wechsler

(48) Eta this kniga book byla was kuplena bought Boris-om Boris-INS dlja for sebja . self (Russian) 'This book was bought by Boris for himself .'

Given that binding is a relation between members of the ARG-ST list, such data would seem problematic for an approach that does not include *by*-phrases on the ARG-ST list. Interestingly, Perlmutter also argues that Russian *sebja* is subjectoriented (see Müller 2021a: Section 4, Chapter 20 of this volume). The intrumental NP *Borisom* can bind *sebja*, only because it corresponds to the subject of active *kupit'*, 'buy'. Assuming that is correct, an HPSG account of Russian passives would need some means of representing the logical subjecthood of these instrumental NPs; this might involve some way of accessing their active counterpart's SUBJ value, or of referencing the first element of the passive verb's ARG-ST list, despite its inertness.

The interaction of binding and control with passivization across languages appears to be varied, and as we have noted, we are not aware of systematic investigations into this variation and possible accounts of it within HPSG. Here, we have surveyed these phenomena and two possible approaches, while noting that some key issues remain unresolved. Notably, both of these approaches introduce non-canonical lexical items, violating either linking or argument realization constraints that otherwise have strong support. Further work is required to assure that these can be preserved in a meaningful way, as opposed to allowing non-canonical structures to appear freely in the lexicon.

# **5.4 Summary**

We have examined in this section several approaches to argument alternations in HPSG and their implications for ARG-ST. For alternations based on semantic differences, different alternants will have different CONTENT values, and linking principles like those we outlined in the previous section account for their syntactic differences. Even where such meaning differences are small, there are differing semantic entailments that can affect linking. For some cases where there seems to be no discernible meaning difference between alternants, it is still possible for linking principles to yield syntactic differences, if the alternants select different KEY predications in CONTENT. The active/passive alternation, however, cannot be accounted for in such a fashion, as it applies to verbs with widely varying CONTENT values. HPSG accounts of passives therefore resort to lexical items that are non-canonical, either in their linking or in their mapping between ARG-ST and valence. Both of these are ways of modeling the demotion of the logical

9 Argument structure and linking

subject. But there is as yet no consensus within the HPSG community on the correct analysis of passives.

# **6 Extended ARG-ST**

Most of this chapter focuses on cases where semantic roles linked to the ARG-ST list are arguments of the verb's core meaning. But in quite a few cases, complements (or even subjects) of a verb are not part of this basic meaning; consequently, the ARG-ST list must be extended to include elements beyond the basic meaning. We consider three cases here, illustrated in (49)–(51).

Resultatives, illustrated in (49), express an effect, which is caused by an action of the type denoted by the basic meaning of the verb. The verb *fischen* 'to fish' is a simple intransitive verb (49a) that does not entail that any fish were caught, or any other specific effect of the fishing (see Müller 2002: 219–220).

(49) a. dass that er he fischt fishes 'that he is fishing'

b. dass that er he ihn it leer empty fischt fishes 'that he is fishing it empty'

c. wegen because.of der the Leerfischung empty.fishing der of.the.GEN Nordsee<sup>4</sup> North.See 'because of the North Sea being fished empty'

In (49b) we see a resultative construction with an object NP and an adjectival secondary predicate. The meaning is that he is fishing, causing it (the body of water) to become empty of fish. Müller (2002: 241) posits a lexical rule for German applying to the verb that augments the ARG-ST list with an NP and AP, and adds the causal semantics to the CONTENT (see Wechsler 2005 for a similar analysis of English resultatives and Müller 2018: Section 7.2.3 for an updated lexical rule and interactions with benefactives). The existence of deverbal nouns like *Leerfischung* 'fishing empty', which takes the body of water as an argument in genitive case (see (49c)) confirms that the addition of the object is a lexical process, as noted by Müller (2002).

Romance clause-union structures as in (50) have long been analyzed as cases where the arguments of the complement of a clause-union verb (*faire* in (50)) are complements of the clause-union verb itself (Aissen 1979).

(German)

<sup>4</sup>die tageszeitung, 1996-06-20, p. 6.

Anthony R. Davis, Jean-Pierre Koenig & Stephen Wechsler

(50) Johanna Johanna a has fait made manger eat les the enfants. children 'Johanna had the children eat.'

Within HPSG, the "union" of the two verbs' dependents is modeled via the composition of ARG-ST lists of the clause union verb, following Hinrichs & Nakazawa (1994) (this is a slight simplification; see Godard & Samvelian 2021, Chapter 11 of this volume for details).

(French)

Abeillé & Godard (1997) have argued that many adverbs including *souvent* in (51) and negative particles or adverbs in French are complements of the verb, and Kim & Sag (2002) extended that view to some uses of negation in English. Such analyses hypothesize that some semantic modifiers are realized as complements, and thus should be added as members of ARG-ST (or members of the DEPS list, if one countenances such an additional list; see below). In contrast to resultatives, which affect the meaning of the verb, or to clause union, where one verb co-opts the argument structure of another verb, what is added to the ARG-ST list in these cases is typically considered a semantic adjunct and a modifier in HPSG (thus it selects the verb or VP via the MOD attribute).


Another case of an adjunct that behaves like a complement is found in (52), taken from (Koenig & Davis 2006: 81). The clitic *en* expressing the cause of death is not normally an argument of the verb *mourir* 'die', but rather an adjunct:

(52) Il he en of.it est is mort. dead.PFV.PST (French) 'He died of it.'

On the widespread assumption (at least within HPSG) that pronominal clitics are verbal affixes (Miller & Sag 1997), the adjunct cause of the verb *mourir* must be represented within the entry for *mourir*, so as to trigger affixation by *en*. Bouma et al. (2001) discuss cases where "adverbials", as they call them, can be part of a verb's lexical entry. To avoid mixing those adverbials with the argument structure list (and having to address their relative obliqueness with syntactic arguments of verbs), they introduce an additional list, the dependents list (abbreviated as DEPS) which includes the ARG-ST list but also a list of adverbials. Each adverbial selects for the verb on whose DEPS list it appears as a dependent, as

#### 9 Argument structure and linking

shown in (53). But, of course, not all verb modifiers can be part of the DEPS list,<sup>5</sup> and Bouma, Malouf, and Sag discuss at length some of the differences between the two kinds of "adverbials".

$$(53)\quad \verb{ver}\,b \Rightarrow \begin{bmatrix} \begin{matrix} \text{HEAD} & \begin{matrix} \square \\ \text{CONT} \end{matrix} \end{matrix} \\ \begin{matrix} \text{DEPS} \end{matrix} \end{bmatrix} \begin{matrix} \begin{matrix} \square \\ \square \end{matrix} \end{matrix} \\ \begin{matrix} \begin{matrix} \square \end{matrix} \end{matrix} \oplus \begin{matrix} \begin{matrix} \square \end{matrix} \end{matrix} \oplus \begin{matrix} \begin{matrix} \begin{matrix} \text{HEAD} \begin{bmatrix} \square \end{matrix} \end{matrix} \end{bmatrix} \begin{matrix} \begin{matrix} \square \end{matrix} \\ \begin{matrix} \begin{matrix} \square \end{matrix} \end{matrix} \end{bmatrix} \end{)} \end{3}$$

Although the three cases we have outlined result in an extended ARG-ST, the ways in which this extension arises differ. In the case of resultatives, the extension results partly or wholly from changing the meaning in a way similar to Rappaport Hovav & Levin (1998): by adding a causal relation, as in for example (54), the effect argument of this causal relation is added to the membership in the base ARG-ST list (see Section 5 for a definition of the attributes KEY and RELS; here it suffices to note that a *cause-rel* is added to the list of relations that are the input of the rule).

$$\begin{array}{c} \text{(54)} \quad \begin{bmatrix} \text{KEY} \\ \text{RELS} \begin{bmatrix} \square \end{bmatrix} \langle ..., \square \ ... \rangle \end{bmatrix} \longmapsto \begin{bmatrix} \text{Key} & \square \ \text{cause-rel} \\ \text{RELS} \begin{bmatrix} \square \end{bmatrix} \oplus \begin{Bmatrix} \begin{bmatrix} \square \end{bmatrix} \end{Bmatrix} \end{array} \end{array}$$

The entries of the clause union verbs are simply stipulated to include on their ARG-ST lists the syntactic arguments of their (lexical) verbal arguments in (55); see Godard & Samvelian (2021), Chapter 11 of this volume for details on approaches to complex predicates in HPSG.

$$\text{(55)}\quad \left[ \text{ARG-ST} \left\langle \dots, \begin{bmatrix} \text{HEAD} & \text{verb} \\ \text{ARG-ST} \left\Box \end{bmatrix} \right\rangle \oplus \Box \right] \right]$$

Finally, (negative) adverbs that select for a verb (VP) are added to the ARG-ST of the verb they select, as shown in (56). The symbol in this rule is known as "shuffle"; it represents any list in which the elements of the two lists are intermixed containing the combined elements of the two lists, but with the relative ordering of elements on each list preserved.<sup>6</sup>

$$\text{(56)}\quad \left[\text{ARG-ST}\,\middle\|\,\right] \mapsto \left[\text{ARG-ST}\,\middle\|\,\underset{\text{(56)}}{\left\{\text{Array}\right\}}\right]$$

<sup>5</sup>See Müller (1999: Section 20.4.1) and Müller (2021a: Section 6.1), Chapter 20 of this volume for possible binding conflicts arising when adjuncts within the nominal domain are treated this way.

<sup>6</sup>See Müller (2021b: 391), Chapter 10 of this volume for more on the shuffle operator.

Anthony R. Davis, Jean-Pierre Koenig & Stephen Wechsler

# **7 Is ARG-ST universal?**

HPSG's ARG-ST attribute does not seem to be a universal property of natural language grammars. The ARG-ST feature is the intermediary between, on the one hand, a semantic representation of an event or state in which participants fill specific roles, and on the other, their syntactic and morphological expression. ARG-ST is defined as a list of *synsem* objects in the entry for a verb lexeme, and is used to model the following grammatical regularities of particular predicators or sets of predicators:

	- 1. are identified with valence list items representing grammatical properties of phrasal dependents (subject and complements),
	- 2. determine verbal morphology, or
	- 3. are left unexpressed.

Koenig & Michelson (2014; 2015a,b) argue that the grammatical encoding of semantic arguments in Oneida (Northern Iroquoian) does not display any of these properties. In fact, the only function of the corresponding intermediate representation in Oneida is to distinguish the arguments of a verb for the purpose of determining verbal prefixes indicating semantic person, number, and gender features of animate arguments. For example, the prefix *lak-* occurs if a thirdperson singular masculine proto-agent argument is acting on a first-person singular proto-patient argument as in *lak-hlo·lí-heʔ* 'he tells me' (habitual aspect), whereas the prefix *li-* occurs if a first singular proto-agent argument is acting on a third masculine singular argument, as in *li-hlo·lí-heʔ* 'I tell him' (habitual aspect). As there is no syntactic agreement, these verbal prefixes encode purely semantic features. *Synsem* objects are therefore not appropriate for this intermediate

#### 9 Argument structure and linking

representation; all that is needed are semantic argument indices to distinguish between (a maximum of two) animate co-arguments distinguished for fixed argument roles for each verb. Koenig & Michelson use the attribute INFL-STR – a highly restricted Oneida feature that replaces ARG-ST– which is a list of referential indices for animate arguments within the inflectional information associated with each verb (see Crysmann 2021: Section 4, Chapter 21 of this volume for more details on current theories of inflectional morphology in HPSG). If Koenig & Michelson are correct, the ARG-ST list may thus not be a universal attribute of words, though present in the overwhelming majority of languages. Linking, understood as constraints between semantic roles and members of the ARG-ST list, is then but one possibility; constraints that relate semantic roles to an INFL-STR list of semantic indices is also an option. In languages that exclusively exploit that latter possibility, syntax is indeed simpler.

# **8 The lexical approach to argument structure**

We end this chapter with a necessarily brief comparison between the approach to argument structure we have described here and other approaches to argument structure that have developed since the 1990s. This chapter describes a *lexical approach to argument structure*, which is typical of research in HPSG. The basic tenet of such approaches is that lexical items include argument structures, which represent essential information about potential argument selection and expression, but abstract away from the actual local phrasal structure. In contrast, *phrasal approaches*, which are common both in Construction Grammar and in transformational approaches such as Distributed Morphology, reject such lexical argument structures. Let us briefly review the reasons for preferring a lexical approach. (This section is drawn from Müller & Wechsler 2014, which may be consulted for more detailed and extensive argumentation. See also Müller 2021c: Section 2, Chapter 32 of this volume.)

In phrasal approaches to argument structure, components of a verb's apparent meaning are actually "constructional meaning" contributed directly by the phrasal structure. The linking constraints of the sort discussed above are then said to arise from the interaction of the verb meaning with the constructional meaning. For example, agentive arguments tend to be realized as subjects, not objects, of transitive verbs. On the theory presented above, that generalization is captured by the linking constraint (16), which states that the ACTOR argument of an *act-und-rel* (actor-undergoer relation) is mapped to the initial item in the ARG-ST list. In a phrasal approach, the agentive semantics is directly associated with

#### Anthony R. Davis, Jean-Pierre Koenig & Stephen Wechsler

the subject position in the phrase structure. In transformational theories, a silent "light verb" (usually called "little *v*") heads a projection in the phrase structure and assigns the agent role to its specifier (the subject). In constructional theories, the phrase structure itself assigns the agent role. In either type of phrasal approach, the agentive component of the verb meaning is actually expressed by the phrasal structure into which the verb is inserted.

The lexicalist's approach to argument structure provides essential information for a verb's potential combination with argument phrases. If a given lexical entry could only combine with the particular set of phrases specified in a single valence feature, then the lexical and phrasal approaches would be difficult to distinguish: whatever information the lexicalist specifies for each valence list item could, on the phrasal view, be specified instead for the phrases realizing those list items. But crucially, the verb need not immediately combine with its specified arguments. Alternatively, it can meet other fates: it can serve as the input to a lexical rule; it can combine first with a modifier in an adjunction structure; it can be coordinated with another word with the same predicate argument structure; instead of being realized locally, one or more of its arguments can be effectively transferred to another head's valence feature (raising or argument composition); or arguments can be saved for expression in some other syntactic position (partial fronting).<sup>7</sup> Here we consider two of these, lexical rules and coordination.

The lexically encoded argument structure is abstract: it does not directly encode the phrase structure or precedence relations between this verb and its arguments. This abstraction captures the commonality across different syntactic expressions of the arguments of a given root.

	- b. The carrots were being nibbled (by the rabbits).
	- c. a large, partly nibbled, orange carrot
	- d. the quiet, nibbling, old rabbits
	- e. the rabbit's nibbling of the carrots
	- f. The rabbit gave the carrot a nibble.
	- g. The rabbit wants a nibble (on the carrot).
	- h. The rabbit nibbled the carrot smooth.

Verbs undergo morpholexical operations like passive (57b), as well as antipassive, causative, and applicative in other languages. They have cognates in other parts of speech such as adjectives (57c, d) and nouns (57e, f, g). Verbs have been argued

<sup>7</sup>See Müller 2021c: Section 2.2, Chapter 32 of this volume for discussion of partial verb phrase fronting.

#### 9 Argument structure and linking

to form complex predicates with resultative secondary predicates (57h), and with serial verbs in other languages.

The same root lexical entry *nibble*, with the same meaning, appears in all of these contexts. The effects of lexical rules together with the rules of syntax dictate the proper argument expression in each context. For example, if we call the first two arguments in an ARG-ST list (such as the one in (57) above) Arg1 and Arg2 (or ACT or UND), respectively, then in an active transitive sentence Arg1 is the subject and Arg2 the object; in the passive, Arg2 is the subject and the referential index of Arg1 is optionally assigned to a *by*-phrase. The same rules of syntax dictate the position of the subject, whether the verb is active or passive. When adjectives are derived from verbal participles, whether active (*a nibbling rabbit*) or passive (*a nibbled carrot*), the rule is that whichever role would have been expressed as the subject of the verb is assigned by the participial adjective to the referent of the noun that it modifies; see Bresnan (1982: 21–32) and Bresnan et al. (2016: Chapter 3). The phrasal approach, in which the agent role is assigned to the subject position, is too rigid.

This issue cannot be solved by associating each syntactic environment with a different meaningful phrasal construction: an active construction with agent role in the subject position, a passive construction with agent in the *by*-phrase position, etc. The problem for that view is that one lexical rule can feed another. In the example above, the output of the verbal passive rule (see (57b)) feeds the adjective formation rule (see (57c)).

A verb can also be coordinated with another verb with the same valence requirements. The two verbs then share their dependents. This causes problems for the phrasal view, especially when a given dependent receives different semantic roles from the two verbs. For example, in an influential phrasal analysis, Hale & Keyser (1993) derived denominal verbs like *to saddle* through noun incorporation out of a structure akin to [PUT a saddle ON x]. Verbs with this putative derivation routinely coordinate and share dependents with verbs of other types:

(58) Realizing the dire results of such a capture and that he was the only one to prevent it, he quickly [saddled and mounted] his trusted horse and with a grim determination began a journey that would become legendary.<sup>8</sup>

Under the phrasal analysis, the two verbs place contradictory demands on a single phrase structure. But on the lexical analysis, this is simple V<sup>0</sup> coordination.<sup>9</sup>

<sup>8</sup>Example from "Jack Jouett House Historic Site: Jack Jouett's Ride"; http://jouetthouse.org/jackjouetts-ride/, 19.03.2021

<sup>9</sup>See also Abeillé & Chaves (2021: Section 5), Chapter 16 of this volume on lexical coordination and for arguments why approaches assuming phrasal coordination for these cases fail.

#### Anthony R. Davis, Jean-Pierre Koenig & Stephen Wechsler

To summarize, a lexical argument structure is an abstraction or generalization over various occurrences of a predicator in syntactic contexts. To be sure, one key use of that argument structure is simply to indicate what sort of words or phrases the predicator must (or can) combine with; if that were the whole story, then the phrasal theory would be viable. But it is not. As it turns out, lexicallyencoded valence structure, once abstracted, can alternatively be used in other ways: among other possibilities, the predicator (crucially including its valence structure) can be coordinated with other predicators that have a similar valence structure, or it can serve as the input to lexical rules specifying a new word or lexeme bearing a systematic relation to the input word. The phrasal approach prematurely commits to a single phrasal position for the realization of a semantic argument. In contrast, a lexical argument structure gives a word the appropriate flexibility to account for the full range of expressions found in natural language.

# **Acknowledgments**

We thank Stefan Müller for so carefully reading and commenting on several versions of the manuscript and helping us improve this chapter considerably. We thank Elizabeth Pankratz for editorial comments and proofreading.

# **References**


#### 9 Argument structure and linking


Anthony R. Davis, Jean-Pierre Koenig & Stephen Wechsler


9 Argument structure and linking


Anthony R. Davis, Jean-Pierre Koenig & Stephen Wechsler


9 Argument structure and linking

Stanford, CA: CSLI Publications. http://cslipublications.stanford.edu/HPSG/ 2003/koenig-davis.pdf (9 February, 2021).


Anthony R. Davis, Jean-Pierre Koenig & Stephen Wechsler


#### 9 Argument structure and linking

*mar: The handbook* (Empirically Oriented Theoretical Morphology and Syntax), 889–944. Berlin: Language Science Press. DOI: 10.5281/zenodo.5599858.


#### Anthony R. Davis, Jean-Pierre Koenig & Stephen Wechsler

*pragmatischen Fundierung* (Studien zur deutschen Grammatik 15), 171–211. Tübingen: originally Gunter Narr Verlag now Stauffenburg Verlag.


9 Argument structure and linking

Wechsler, Stephen & I. Wayan Arka. 1998. Syntactic ergativity in Balinese: An argument structure based theory. *Natural Language & Linguistic Theory* 16(2). 387–441. DOI: 10.1023/A:1005920831550.

# **Chapter 10**

# **Constituent order**

# Stefan Müller

Humboldt-Universität zu Berlin

This chapter discusses local ordering variants and how they can be analyzed in HPSG. So-called scrambling, the local reordering of arguments of a head, can be accounted for by assuming flat rules or binary branching rules with arbitrary order of saturation. The difference between SVO and SOV is explained by assuming different mappings between the argument structure list (a list containing all arguments of a head) and valence features for subjects and complements. The position of the finite verb in initial or final position in languages like German can be accounted for by flat rules and a separation between immediate dominance and linear precedence information or by something analogous to head-movement in transformational approaches. The chapter also addresses the analysis of languages allowing even more freedom than just scrambling arguments. It is shown how one such language, namely Warlpiri, can be analyzed with so-called constituent order domains allowing for discontinuous constituents. I discuss problems of domainbased approaches and provide an alternative account of Warlpiri that does not rely on discontinuous constituents.

# **1 Introduction**

This chapter deals with constituent order, with a focus on local order variants. English is the language that is treated most thoroughly in theoretical linguistics but is probably also a rather uninteresting language as far as the possibilities of reordering constituents is concerned: the order of subject, verb, and object is fixed in sentences like (1):

(1) Kim likes bagels.

Stefan Müller. 2021. Constituent order. In Stefan Müller, Anne Abeillé, Robert D. Borsley & Jean- Pierre Koenig (eds.), *Head-Driven Phrase Structure Grammar: The handbook*, 369–417. Berlin: Language Science Press. DOI: 10. 5281/zenodo.5599836

### Stefan Müller

Of course, there is the possibility to front the object as in (2) but this is a special, non-local construction that is not the topic of this chapter but is treated in Borsley & Crysmann (2021), Chapter 13 of this volume.

(2) Bagels, Kim likes.

This chapter deals with scrambling (the local reordering of arguments) and with alternative placements of heads (called *head movement* in some theories). Examples of the former are the subordinate clauses in (3) and an example of the latter is given in (4):



(3) shows that in addition to the unmarked order in (3a) (see Höhle (1982) on the notion of unmarked order), five other argument orders are possible in sentences with three-place verbs. As with the examples just given, I will use German if a phenomenon does not exist in English. Section 6.2 discusses examples from Warlpiri, a language having even freer constituent order.

(4) shows that the verb is placed in initial position in yes/no questions in German. This contrasts with the verb-final order in the subordinate clause in (3a), which has the same order as far as the arguments are concerned. This alternation of verb placement is usually treated as head movement in the transformational literature (Bach 1962; Bierwisch 1963: 34; Reis 1974; Thiersch 1978: Chapter 1). Declarative main clauses in German are V2 clauses and the respective fronting

#### 10 Constituent order

of the preverbal constituent is usually treated as a non-local dependency (see Borsley & Crysmann 2021, Chapter 13 of this volume). Hence, V2 sentences will not be handled here.

The following sections explore the theoretical options within the HPSG framework for dealing with these phenomena. I first discuss the separation of grammar rules into an immediate dominance part and a linear precedence component in Section 2 and then flat vs. binary branching structures (Section 3). While flat structures allow verbs to be ordered clause-finally or clause-initially, this is not the case for binary branching structures, since only sisters can be ordered. So, for (3a) one would get the bracketing in (5a). If *das Buch* 'the book' and *gibt* 'gives' are ordered in a different order, (5b) results.


Hence, local reordering is not sufficient to get clause-initial verb order and therefore, proposals with binary branching structures are usually paired with HPSG's analogue of what is head-movement in transformational theories. These are explained in Section 5. Section 6 introduces an extension to standard HPSG developed by Reape (1994): constituent order domains. Such constituent order domains allow for discontinuous constituents and have been used to account for languages like Warlpiri (Donohue & Sag 1999). In contrast, Section 7 shows how such languages can be analyzed without admitting discontinuous constituents.

# **2 ID/LP format**

HPSG was developed out of Generalized Phrase Structure Grammar (GPSG) and Categorial Grammar (Ajdukiewicz 1935; Pollard 1984; Steedman 2000; see also Flickinger, Pollard & Wasow 2021, Chapter 2 of this volume on the history of HPSG). The ideas concerning linearization of daughters in a local tree were taken over from GPSG (Gazdar, Klein, Pullum & Sag 1985: Section 3.2). In GPSG a separation between immediate dominance and linear precedence is assumed. So, while in classical phrase structure grammar, a phrase structure rule like (6) states that the NP[nom], NP[dat] and NP[acc] have to appear in exactly this order, this is not the case in GPSG and HPSG:

(6) S → NP[nom] NP[dat] NP[acc] V

### Stefan Müller

The HPSG schemata corresponding to the immediate dominance rule (ID rule) in (6) do not express information about ordering. Instead, there are separate linear precedence (LP) rules (also called linearization rules). A schema like (6) licenses 24 different orders: the six permutations of the three arguments that were shown in (3) and all possible placements of the verb (to the right of NP[acc], between NP[dat] and NP[acc], between NP[nom] and NP[dat], to the left of NP[nom]). Orders like NP[nom], NP[dat], V, NP[acc] are not attested in German and hence these orderings have to be filtered out.<sup>1</sup> This is done by linearization rules, which can refer to features or to the function of a daughter in a schema. (7) shows some examples of linearization rules:

(7) a. X < V b. X < V[INI−] c. X < Head [INI−]

The first rule says that all constituents have to precede a V in the local tree. The second rule says that all constituents have to precede a V that has the INITIAL value −. One option to analyze German would be the one that was suggested by Uszkoreit (1987: Section 2.3) within the framework of GPSG: one could allow for two linearization variants of finite verbs. So in addition to the INI− variant of verbs there could be an INI+ variant and this variant would be linearized initially. This reduces the number of permutations licensed by (6) and LP rules to 12: verbinitial placement and 6 permutations of the NPs and verb-final placement with 6 permutations of the arguments. The ID rule in (6) together with the two linearization rules linearizing the verb in initial or final position therefore licenses the same orders as the following twelve phrase structure rules would do:

$$\begin{array}{ll} \text{(8)} & \text{a. S} \rightarrow \text{NP[nom] NP[dat]} & \text{NP[acc] V} \\ & \text{S} \rightarrow \text{NP[nom] NP[acc]} & \text{NP[dat] V} \\ & \text{S} \rightarrow \text{NP[acc]} & \text{NP[nom] NP[dat]} & \text{V} \\ & \text{S} \rightarrow \text{NP[acc]} & \text{NP[dat]} & \text{NP[nom] V} \\ & \text{S} \rightarrow \text{NP[dat]} & \text{NP[nom] NP[acc]} & \text{V} \\ & \text{S} \rightarrow \text{NP[dat]} & \text{NP[acc]} & \text{NP[nom] V} \\ \end{array}$$

<sup>1</sup>Extraposition of NPs is possible in German Müller (1999: Section 13.1.1.3, 13.1.2.3; 2002a: ix– xi), although it is marked. Extraposition is a non-local dependency and hence treated by a different mechanism. Like fronted NPs in V2 sentences, extraposed NPs are not affected by the linearization rules stated here. See Keller (1995), Müller (1999: Chapter 13) and Borsley & Crysmann (2021: Section 8), Chapter 13 of this volume on extraposition.

10 Constituent order

b. S → V NP[nom] NP[dat] NP[acc] S → V NP[nom] NP[acc] NP[dat] S → V NP[acc] NP[nom] NP[dat] S → V NP[acc] NP[dat] NP[nom] S → V NP[dat] NP[nom] NP[acc] S → V NP[dat] NP[acc] NP[nom]

Note that we do not need a linearization rule for every ID rule. For example, in a grammar with rules for intransitive, transitive, and ditransitive verbs, head ordering is taken care of by general LP rules of the type in (7b) applying to the respective ID rules. The LP rule in (7c) is even more general than (7b) in that it does not mention the part of speech but instead refers to the function of the constituent. The rule says that a head that has the INI value '−' has to be linearized to the right of all other elements in the local tree. Hence, it also applies to adjectives and postpositions and their dependents.

This separation of linearization rules from phrase structure rules also makes it possible to capture other generalizations. For example, short elements tend to precede heavy constituents (Behaghel's Law of Increasing Constituents, Behaghel 1909: 139). Uszkoreit (1987: Chapter 5) captured one aspect of this more general rule by formulating a linearization statement requiring that pronouns precede non-pronouns. The LP rules apply to a large set of ID rules, for example for intransitive, transitive and ditransitive verbs. By factoring out the LP constraints, generalizations over the whole set of phrase structure rules are covered. Uszkoreit's constraints on the order of arguments in the so-called *Mittelfeld* (that is, for rules like (8)) are assumed to be violable. While violable constraints are not part of the standard HPSG formalism, this is something desirable and something that is worked on. See also Abeillé & Godard's work on weight-based linearization and the (reduced) mobility of various categories: bare nominals in various languages, certain pronouns (Abeillé & Godard 1999a), certain adverbs (Abeillé & Godard 2001), negation (Abeillé & Godard 1997; 2004), and attributive adjectives (Abeillé & Godard 1999b). In various papers, Abeillé & Godard propose a three valued WEIGHT feature to account for the ordering of light, middle-weight and heavy constituents (Abeillé & Godard 2000; 2004). See also Godard & Samvelian (2021: Section 4.3), Chapter 11 of this volume on complex predicates and weight.

This treatment of constraints on linearization has an advantage that was already pointed out by researchers working in GPSG: it captures the generalizations regarding linearization. For instance, the order of verbs with respect to their arguments is the same in embedded sentences in German, independent of the finiteness of the verb. Hence, as was explained above, one LP statement captures the generalization about argument-head order for examples like (9):

### Stefan Müller

	- b. dass that er he.NOM versucht, tried [dem the.DAT Mann man das the.ACC Buch book zu to geben] give 'that he tried to give the man the book'

The generalizations about linearization of arguments with respect to each other are also captured. For example, the relative order of dative and accusative object in (9) is the same for in both environments. The constraints regarding linearization hold across rules. By factoring these constraints out, generalizations regarding constituent order can be captured. See Uszkoreit (1987: Section 3.1) for weighted constraints for the ordering of constituents in the *Mittelfeld*.

Furthermore, cross-linguistic generalizations about constituent structure can be captured. For example, the two phrase structure rules in (10) would be needed for head-initial and head-final languages, respectively:

(10) a. VP → V NP NP b. VP → NP NP V

In an ID/LP framework only one ID rule is needed to describe both sorts of languages. The linearization of the head is factored out of the rules.

Similarly, HPSG has just one schema for Head-Adjunct structures, although languages like English have some adjuncts that precede their heads and others that follow them. The schema in (11) corresponds to a phrase structure rule in GPSG. The values of features like HEAD-DTR and NON-HEAD-DTRS are feature descriptions that correspond to daughters in local trees or to symbols on righthand sides of phrase structure rules (see Abeillé & Borsley 2021: 8, Chapter 1 of this volume for the representation of dominance structure in HPSG). The schema in (11) does not say anything about the order of the daughters:

$$\begin{array}{l} \text{(11)} \quad \begin{array}{l} \text{Head-Adignment Schema} \\ \text{head-adignment-phrase} \Rightarrow \\ \begin{bmatrix} \text{HEAD-DTR} \left[ \text{SYNSEM} \left[ \text{\Box} \right] \end{bmatrix} \\\\ \text{NON-HEAD-DTRS} \left\langle \left[ \begin{array}{l} \text{SYNSEM}|\text{LOC}|\Box \\ \text{SPR} \left\langle \right\rangle \\ \text{COMP} \left\langle \right\rangle \end{array} \right] \right\rangle \\\\ \end{array} \end{array}$$

There is a head daughter and a list of non-head daughters. The respective daughters are specified as the value of a feature or as an element in a list but they are

#### 10 Constituent order

not ordered with respect to each other in the schema. Ordering is taken care of by two LP rules saying that adjuncts marked as pre-modifiers (e.g., attributive adjectives) have to precede their head while those that are marked as post-modifiers (noun-modifying prepositions) follow it:

	- b. Head *<* Adjunct[PRE-MODIFIER –]

In general, there are two options for two daughters: head-initial and head-final order. Examples are given in (13):<sup>2</sup>


When linearization rules enforce head-initial order, as in the case of modification by a PP in English, the PHON value of the head daughter is concatenated with the PHON value of the non-head daughter, and if the order has to be the other way around as in the case of adjectives modifying nouns, the non-head daughter is concatenated with the head daughter. An adjective is specified as PRE-MODIFIER + and a preposition as PRE-MODIFIER −. Since these features are head-features (see Abeillé & Borsley (2021: 22), Chapter 1 of this volume on head features), they are also accessible at the level of adjective phrases and prepositional phrases.

For languages with free variation in head-adjunct order, it would suffice to not state any LP rule and one would get both orders with the same Head-Adjunct schema. So, the separation of immediate dominance and linear precedence allows for an underspecification of order. Therefore HPSG grammarians are not forced to assume several different constructions for attested patterns or derivational processes that derive one order from another more basic one.

# **3 Flat and binary branching structures**

The previous section discussed LP rules and used flat phrase structure rules for illustration. The corresponding flat structures are also used in HPSG. (14) shows

<sup>2</sup>⊕ (append) is a relational constraint that concatenates two lists.

### Stefan Müller

a Head-Complement schema that combines a head with all the complements selected via the COMPS list.<sup>3</sup>

> 

(14) Head-Complement Schema: *head-complement-phrase* ⇒ SYNSEM|LOC|CAT|COMPS hi HEAD-DTR - SYNSEM|LOC|CAT|COMPS 1 NON-HEAD-DTRS synsem2signs(1)

synsem2signs is a relational constraint mapping a list of *synsem* objects as they are contained in the COMPS list onto a list of objects of type *sign* as they are contained in HEAD-DTR and NON-HEAD-DTRS (see Ginzburg & Sag 2000: 34 for a similar proposal).<sup>4</sup> How this schema can be used to analyze VPs like the one in (15) is shown in Figure 1.

(15) Kim gave Sandy a book.

Figure 1: Analysis of the VP *gave Sandy a book* with a flat structure

HPSG differs from purely phrase structure-based approaches in that the form of a linguistic object is not simply the concatenation of the forms associated with

<sup>3</sup>Ginzburg & Sag (2000: 4) assume a list called DTRS for all daughters including the head daughter. It is useful to be able to refer to specific non-head daughters without having to know a position in a list. For example in head-adjunct structures the adjunct is the selector. So I keep DTRS for a list of ordered daughters and HEAD-DTR and NON-HEAD-DTRS for material that is not necessarily ordered with respect to each other. In the case of binary branching, structures like head-adjunct structures, head-filler structures, head-specifier structures, and headcomplement structures have the non-head daughter as the sole member of the NON-HEAD-DTRS list.

<sup>4</sup> In Sign-Based Construction Grammar (SBCG; Sag 2012) the objects in valence lists are of the same type as the daughters. A relational constraint would not be needed in this variant of the HPSG theory (see Abeillé & Borsley 2021: Section 7.2, Chapter 1 of this volume and Müller 2021d: Section 1.3.2, Chapter 32 of this volume for further discussion of SBCG). Theories working with a binary branching Head-Complement Schema as (19) on page 379 would not need the relational constraint either, since the *synsem* object in the COMPS list can be shared with the SYNSEM value of the element in the list of non-head daughters directly.

#### 10 Constituent order

the terminal symbols in a tree (words or morphemes). Every linguistic object has its own phonological representation. So in principle one could design theories in which the combination of *Mickey Mouse* and *sleeps* is pronounced as *Donald Duck laughs*. Of course, this is not done. The computation of the PHON value of the mother is dependent of the PHON values of the daughters. But the fact that the PHON values of a linguistic sign are not necessarily a strict concatenation of the PHON values of the daughters can be used to model languages having a less strict order than English. Pollard & Sag (1987: 168) formulate the Constituent Order Principle, which is given as (16) in adapted form:

(16) Constituent Order Principle:

*phrase* ⇒ PHON order-constituents(1) DTRS 1 

DTRS is a list of all daughters including the head daughter (if there is one). This setting makes it possible to have the daughters in the order in which the elements are ordered in the COMPS list (primary object, secondary object, and obliques) and then compute a PHON value in which the secondary object precedes the primary object. French is a language with freer constituent order than English and such flat structures with appropriate reorderings are suggested by Abeillé & Godard (2000). For English the function order-constituents would just return a concatenation of the PHON values of the daughters, but for other languages it would be much more complicated. In fact this function and its interaction with linear precedence constraints was never worked out in detail.

Researchers working on English and French usually assume a flat structure (Pollard & Sag 1994: 39–40, 362; Sag 1997: 479; Ginzburg & Sag 2000: 34; Abeillé & Godard 2000) but assuming binary branching structures would be possible as well, as is clear from analyses in Categorial Grammar, where binary combinatory rules are assumed (Ajdukiewicz 1935; Steedman 2000). For languages like German it is usually assumed that structures are binary branching (but see Reape 1994: 156 and Bouma & van Noord 1998: 51). The reason for this is that adverbials can be placed anywhere between the arguments, as the following example from Uszkoreit (1987: 145) shows:

(17) *Gestern* yesterday hatte had *in* during *der* the *Mittagspause* lunch.break der the Vorarbeiter foreman *in* in *der* the *Werkzeugkammer* tool.shop dem the Lehrling apprentice *aus Boshaftigkeit* maliciously *langsam* slowly zehn ten schmierige greasy Gußeisenscheiben cast.iron.disks *unbemerkt* unnoticed in in die the Hosentasche pocket gesteckt. put 'Yesterday during the lunch break, the foreman maliciously put ten greasy cast iron disks slowly into the apprentice's pocket unnoticed.'

### Stefan Müller

A way to straightforwardly analyze adjunct placement in German and Dutch is to assume that adjuncts can attach to any verbal projection. For example, Figure 2 shows the analysis of (18):

(18) weil because deshalb therefore jemand somebody gestern yesterday dem the Kind child schnell quickly das the Buch book gab gave 'because somebody quickly gave the child the book yesterday'

Figure 2: Analysis of [*weil*] *deshalb jemand gestern dem Kind schnell das Buch gab* 'because somebody quickly gave the child the book yesterday' with binary branching structures

The adverbials *deshalb* 'therefore', *gestern* 'yesterday' and *schnell* 'quickly' may attach to any verbal projection. For example, *gestern* could also be placed at the other adjunct positions in the clause.

Binary branching structures with attachment of adjuncts to any verbal projection also account for recursion and hence the fact that arbitrarily many adjuncts can attach to a verbal projection. Of course it is possible to formulate analyses with flat structures that involve arbitrarily many adjuncts (Kasper 1994; van

#### 10 Constituent order

Noord & Bouma 1994; Abeillé & Godard 2000: Section 5; Bouma et al. 2001: Section 4), but these analyses involve relational constraints in schemata or in lexical items or an infinite lexicon. In Kasper's analysis, the relational constraints walk through lists of daughters of unbounded length in order to compute the semantics. In the other three analyses, (some) adjuncts are treated as valents, which may be problematic because of scope issues. This cannot be dealt with in detail here, but see Levine & Hukari (2006: Section 3.6) and Chaves (2009) for discussion.

The following schema licenses binary branching head-complement phrases:

	- SYNSEM|LOC|CAT|COMPS 1 ⊕ 2 HEAD-DTR - SYNSEM|LOC|CAT|COMPS 1 ⊕ 3 ⊕ 2 NON-HEAD-DTRS -SYNSEM 3

The COMPS list of the head daughter is split into three lists: a beginning ( 1 ), a list containing 3 and a rest ( 2 ). 3 is identified with the SYNSEM value of the non-head daughter. All other elements of the COMPS list of the head daughter are concatenated and the result of this concatenation ( 1 ⊕ 2 ) is the COMPS list of the mother node. This schema is very general. It works for languages that allow for scrambling, since it allows an arbitrary element to be taken out of the COMPS list of the head daughter and realize it in a local tree. The schema can also be "parameterized" to account for languages with fixed word order. For headfinal languages with fixed order, 2 would be the empty list (= combination with the last element in the list) and for head-initial languages with fixed order (e.g., English), 1 would be the empty list (= combination with the first element in the list). Since the elements in the COMPS list are ordered in the order of Obliqueness (Keenan & Comrie 1977; Pullum 1977) and since this order corresponds to the order in which the complements are serialized in English, the example in (15) can be analyzed as in Figure 3. <sup>5</sup> The second tree in the figure is the German counterpart of *gave Sandy a book*: the finite verb in final position with its two objects in normal order. Section 4 explains why SOV languages like German and Japanese contain their subject in the COMPS list while SVO languages like English and Romance languages do not.

<sup>5</sup>This structure may seem strange to those working in Mainstream Generative Grammar (MGG, GB/Minimalism). In MGG, different branchings are assumed, since the form of the tree plays a role in Binding Theory. This is not the case in HPSG: Binding is done on the ARG-ST list. See Müller (2021a), Chapter 20 of this volume for a discussion of HPSG's Binding Theory and Borsley & Müller (2021), Chapter 28 of this volume for a comparison between HPSG and Minimalism.

Stefan Müller

Figure 3: Analysis of the English VP *gave Sandy a book* and the corresponding German verbal projection *Sandy ein Buch gab* with binary branching structures

The alternative to using relational constraints as the two appends in the schema in (19) is to use sets rather than lists for the representation of valence information (Gunji 1986: Section 4; Hinrichs & Nakazawa 1989: 8; Pollard 1996: 296; Oliva 1992a: 187; Engelkamp, Erbach & Uszkoreit 1992: 205). The Head-Complement Schema would combine the head with one of its complements. Since the elements of a set are not ordered, any complement can be taken and hence all permutations of complements are accounted for.

The disadvantage of set-based approaches is that sets do not impose an order on their members, but an order is needed for various subtheories of HPSG (see Przepiórkowski (2021), Chapter 7 of this volume on case assignment, and Müller (2021a), Chapter 20 of this volume on Binding Theory). In the approach proposed above and in Müller (2005a: 7; 2015a: 945; 2015c: 53–54), the valence lists are ordered but the schema allows for combination with any element of the list. For valence representation and the order of elements in valence lists see Davis, Koenig & Wechsler (2021), Chapter 9 of this volume.

# **4 SVO vs. SOV**

The careful reader will have noticed that the COMPS list of *gave* in Figure 3 contains the two objects, while its German counterpart *gab* has three elements in the COMPS list. The rationale behind this difference is explained in this section.

In principle, one could assume a rule like (6) for SVO languages like English as well. The SVO order would then be accounted for by linearization rules stating that NP[*nom*] precedes the finite verb while other arguments follow it. This would get the facts about simple sentences like (20a) right but leaves the analysis of (20b) open.

10 Constituent order

	- b. Peter often reads books.

The generalization about languages like English is that adverbials can appear to the left of verbs or to the right of the verbs' complements, that is, to the left or to the right of the unit formed by verbs and complements: the VP. Researchers like Borsley (1987) argued that subjects, specifiers, and complements differ in crucial ways and should be represented by special (valence) features. For example, the subject of the VP *to read more books* in (21) is not realized but is referred to in Control Theory (Abeillé 2021, Chapter 12 of this volume).

(21) Peter tries to read more books.

The subject in English main clauses is similar to the determiner in nominal structures, so one way of expressing this similarity is by using the same valence features and the same schema for subject-VP combinations as for determiner-noun combinations.<sup>6</sup> The schema is given here as (22):

(22) Specifier-Head Schema:

*specifier-head-phrase* ⇒ SYNSEM|LOC|CAT|SPR 1 HEAD-DTR|SYNSEM|LOC|CAT " SPR 1 ⊕ 2 COMPS hi # NON-HEAD-DTRS -SYNSEM 2 

The last element of the SPR list is realized as the non-head daughter. The remaining list is passed up to the mother node. Note that the non-head daughter is taken from the end of the SPR list. For heads that have exactly one specifier this difference is irrelevant, but in the analysis of object shift in Danish suggested by Müller & Ørsnes (2013), the authors assume multiple specifiers and hence the difference in order of combination is relevant. The head-daughter must have an empty COMPS list. This way it is ensured that verbs form a unit with their objects (the VP) and the subject is combined with the VP, rather than the subject combining with a lexical verb and this combination combining with objects later.

The analysis of the sentence in (23) including the analysis of the NP *a book* is given in Figure 4.

<sup>6</sup>This is non-standard in HPSG. Usually the SUBJECT feature is used for subjects and SPR for determiners (but see Sag, Wasow & Bender (2003: 100–103), where subjects are also selected via SPR). I follow the German HPSG tradition and use SUBJ for unexpressed subjects. See also Van Eynde (2021), Chapter 8 of this volume for alternative analyses of nominal structures that do not assume a selection of the determiner by the noun. The proposal suggested here captures the parallelism between the sentential and the nominal domain (Machicao y Priemer & Müller 2021), a goal of analyses in GB/Minimalism since Abney (1987).

### Stefan Müller

(23) Kim gave Sandy a book.

Figure 4: Analysis of *Kim gave Sandy a book* with SPR and COMPS feature and a flat VP structure

For German, it is standardly assumed that the subjects of finite verbs are treated like complements (Pollard 1996: 295–296, Kiss 1995: Section 3.1.1) and hence are represented on the COMPS list (as in Figure 3). The assumption that arguments of German finite verbs are complements is also made by researchers working in different research traditions (e.g. Eisenberg 1994: 376). By assuming that the subject is listed among the complements of a verb it is explained why it can be placed in any position before, between, and after them.<sup>7</sup> So in summary, German differs

(i) das the Bild picture von of Kim Kim

<sup>7</sup>An alternative way of accounting for the orders would be to keep the special feature for subjects and allow subjects to combine with non-maximal verbal projections. The Head-Specifier Schema in (22) would lack the constraint on the head daughter to be COMPS hi. However, this would cause problems in the analysis of structures with the head in the middle. The standard analysis of (i) combines the head *Bild* 'picture' with the PP complement first and then the result *Bild von Kim* with the determiner.

If the constraint that the head daughter in head-specifier structures has to have an empty COMPS list is removed, two analyses are possible: the determiner can be combined with the noun first and the *von*-PP can be added later. This kind of spurious ambiguity is usually avoided.

#### 10 Constituent order

from English in the way the arguments are distributed on the valence lists, in order to capture the similarity in English between combinations of subjects with VPs and determiners with nouns, and to allow German the flexible constituent order it needs. However, HPSG has a more basic representation in which the languages do behave the same: the argument structure represented on the ARG-ST list. The ARG-ST list contains *synsem* objects and is used for linking (Davis, Koenig & Wechsler 2021, Chapter 9 of this volume), case assignment (Przepiórkowski 2021, Chapter 7 of this volume), and binding (Müller 2021a, Chapter 20 of this volume). Ditransitive verbs in German and English have three NP arguments on their ARG-ST and they are linked in the same way to the semantic representation (Müller 2018: 62; 2021c). (24) shows the mapping from ARG-ST to SPR and COMPS:

(24) a. *gives* (English, SVO language): b. *gibt* (German, SOV language): SPR 1 COMPS 2 ARG-ST <sup>1</sup> NP ⊕ 2 NP, NP SPR hi COMPS 1 ARG-ST 1 NP, NP, NP 

 In SVO languages, the first element of the ARG-ST list is mapped to SPR and all others to COMPS and in languages without designated subject position all ARG-ST elements are mapped to COMPS.

Having explained scrambling in HPSG and the order of subjects in SVO languages, I now turn to "head movement".

# **5 Head movement vs. constructional approaches that assume flat structures**

The Germanic languages signal clause type by verb position. All Germanic languages with the exception of English are V2 languages: the finite verb is in second position in declarative main clauses. The first position can be filled by any other constituent, for example a subject, objects, or adverbials. (25) shows an example from the V2 language German and its English translation.

(25) Eigentlich actually mag like ich I Katzen cats sehr. really (German) 'I actually really like cats.'

The fronted material is not necessarily from the matrix clause, clause boundary crossing non-local dependencies are possible. The same holds for questions with *w*-phrases.

### Stefan Müller

Yes/no questions are formed by putting the verb in initial position:

(26) Magst like du you Katzen? cats 'Do you like cats?'

(German)

English is a so-called *residual V2 language* (Rizzi 1990), that is, there are some constructions that are parallel to what is known from V2 languages. For example, while declarative clauses are in base order (SVO), questions follow the pattern that is known from other Germanic languages with the finite verb in second position.<sup>8</sup>

(27) What will Kim read \_?

Analyses assuming flat structures (or flat linearization domains, see Section 6) usually treat alternative orders of verbs in Germanic languages as linearization variants (Reape 1994; Kathol 2001; Müller 1995; 2003b; Bjerre 2006), but this is not necessarily so, as Bouma and van Noord's analysis of Dutch clauses shows (Bouma & van Noord 1998: 62, 71). The alternative to verb placement as linearization is something that is similar to verb movement in Government & Binding: an empty element takes the position of the verb in its canonical position and the verb is realized in initial or – if something is realized before the finite verb – in second position. The following subsection deals with such approaches in more detail. Subsection 5.2 deals with a constructional approach.

# **5.1 Head movement approaches**

Building on work by Jacobson (1987) in the framework of Categorial Grammar, Borsley (1989) showed that in addition to the analysis of auxiliary inversion in English that was suggested in GPSG (Gazdar et al. 1985: Section 4.3), an analysis that is similar to the movement-based analysis in GB is possible in HPSG as well. Head movement analyses in GPSG and HPSG are concerned with the verb placement in pairs such as the one in (28) rather than with adverb placement as in GB analyses of head movement by Pollock (1989) and Cinque (1999).

	- b. Kim will get the job.

<sup>8</sup>SVO is not V2 although the verb is in second position in SVO sentences. Languages can be categorized into SOV, SVO, VSO, OSV, OVS, and VOS languages and into V2 or non-V2 languages. These two dimensions are independent. For example, Danish is an SVO language that is V2, while German is SOV and V2 (Haftka 1996; Haider 2020). See Müller (2021c) for discussion and the analysis of this variation in HPSG.

#### 10 Constituent order

The technique that is used in Borsley's analysis is basically the same that was developed by Gazdar (1981) for the treatment of nonlocal dependencies in GPSG. An empty category is assumed and the information about the missing element is passed up the tree until it is bound off at an appropriate place (that is, by the fronted verb). Note that the heading of this section contains the term *head movement* and I talk about traces, but it is not the case that something is actually moved. There is no underlying structure with a verb after the subject that is transformed into one with the verb fronted and a remaining trace in the verb's original position. Instead, the empty element is a normal element in the lexicon and can function as the verb in the respective position. The analysis of (28a) is shown in Figure 5. A special variant of the auxiliary is licensed by a unary

Figure 5: Analysis of English auxiliary constructions as head-movement following Borsley (1989)

rule. The unary rule has as a daughter the auxiliary as it appears in canonical SVO order as in (28b). It licenses an auxiliary selecting a full clause in which the daughter auxiliary (with the LOCAL value 2 ) is missing. The fact that the auxiliary is missing is represented as the value of DOUBLE SLASH (DSL). The value of DSL is a *local* object, that is, something that contains syntactic and semantic information ( 2 in Figure 5). DSL is a head feature and hence available everywhere

### Stefan Müller

along a projection path (see Abeillé & Borsley (2021: 22), Chapter 1 of this volume for the Head Feature Principle). The empty element for head movement is rather simple:

(29) Empty element for head movement:

 *word* PHON hi SYNSEM|LOC 1 - CAT|HEAD|DSL 1 

 It states that there is an empty element that has the local requirements that correspond to its DSL value. For cases of verb movement it says: I am a verb that is missing itself.

Such head-movement analyses are assumed by most researchers working on German (Kiss & Wesche 1991: Section 4.7; Oliva 1992b; Netter 1992; Frank 1994; Kiss 1995: Section 2.2.4.2; Feldhaus 1997: Section 3.1.1.1, Meurers 2000: Section 5.1; Müller 2005a; 2021b) and also by Bouma & van Noord (1998: 62, 71) in their work on Dutch, by Müller & Ørsnes (2015) in their grammar of Danish and by Müller (2021c) for Germanic in general.

# **5.2 Constructional approaches**

The alternative to head-movement-based approaches is a flat analysis with an alternative serialization of the verb. This was already discussed with respect to German, but I want to discuss English auxiliary constructions here, since they have figured prominently in linguistic discussions.<sup>9</sup> In the analysis of (30) shown in Figure 6, the auxiliary *did* selects for the subject *Kim* and a VP *get the job*.

(30) Did Kim get the job?

The tree in Figure 6 is licensed by a schema combining a head with its subject ( <sup>1</sup> ) and its VP complement ( <sup>2</sup> ) in one go.<sup>10</sup> As has been common in HPSG since the mid-1990s (Sag 1997), phrasal schemata are organized in type hierarchies and the general schema for auxiliary-initial constructions has the type *aux-initialcxt*. Fillmore (1999) and Sag et al. (2020) argue that there are various usages of auxiliary-initial constructions and assign the respective usages to subconstructions of the general auxiliary-initial construction. Technically this amounts to stating subtypes of *aux-initial-cxt*. For example, Sag et al. (2020: 116) posit a

<sup>9</sup>For a discussion including French verb placement see Abeillé & Godard (1997) and Kim & Sag (2002).

<sup>10</sup>An alternative is to assume a separate valence feature for the subject (SUBJ) and a schema that combines the head with the element in the SUBJ list and the elements in the COMPS list (Ginzburg & Sag 2000: 36).

10 Constituent order

Figure 6: Analysis of English auxiliary constructions based on Sag et al. (2020: 117)

subtype *polar-int-cl* for polar interrogatives like (31a) and another subtype *auxinitial-excl-cl* for exclamatives like (31b).

	- b. Are they crazy!

Chomsky (2010) compared the various clause types used in HPSG with the – according to him – much simpler Merge-based analysis in Minimalism. Minimalism assumes just one very general schema for combination (External Merge is basically equivalent to our Head-Complement Schema (19) above, see Müller (2013b: 937–939)), so this rule for combining linguistic objects is very simple, but this does not help in any way when considering the facts: there are at least five different meanings associated with auxiliary initial clauses (polar interrogative, blesses/curses, negative imperative, exclamatives, conditionals) and these have to be captured somewhere in a grammar. One way is to state them in a type hierarchy as is done in some HPSG analyses and in Sign-Based Construction Grammar, another way is to use implicational constraints that assign various meanings to actual configurations (see Section 5.3), and a third way is to do everything lexically. The only option for Minimalism is the lexical one. This means that Minimalism has to either assume as many lexical items for auxiliaries as there are types in HPSG or to assume empty heads that contribute the meaning that is contributed by the phrasal schemata in HPSG (Borsley 2006: Section 5; Borsley & Müller 2021: Section 4.1.5, Chapter 28 of this volume). The latter proposal is generally assumed in Cartographic approaches (Rizzi 1997). Since there is a fixed configuration of functional projections that contribute semantics, one could term these Rizzi-style analyses *Crypto-Constructional*.

Having discussed a lexical approach involving an empty element and a phrasal approach that can account for the various meanings of auxiliary inversion constructions, I turn now to a mixed approach in the next section and show how the various meanings associated with certain patterns can be integrated into ac-

### Stefan Müller

counts with rather abstract schemata for combinations like the one described in Section 5.1.

# **5.3 Mixed approaches**

The situation with respect to clause types is similar in German. Verb first sentences can be yes/no questions (32a), imperatives (32b), conditional clauses (32c), and declarative sentences with topic drop (32d).

(German)


(33) a. Wer who kommt? comes b. Peter Peter kommt. comes c. Jetzt now komm! come

'Come now!'

While one could try and capture this situation by assuming surface order-related clause types, such approaches are rarely used in HPSG (but see Kathol (2001) and Wetta (2011), and see Section 6.4.2 on why such approaches are doomed to failure). Rather, researchers assumed binary branching head-complement structures together with verb movement (for references see the end of Section 5.1).<sup>11</sup>

<sup>11</sup>I assumed linearization domains (see Section 6) for ten years and then switched to the headmovement approach (Müller 2005a,b; 2021b). For a detailed discussion of all alternative proposals and a fully worked out analysis see Müller (2021b).

#### 10 Constituent order

As was explained in Section 5.1, the head movement approaches are based on lexical rules or unary projections. These license new linguistic objects that could contribute the respective semantics. In analogy to what Borsley (2006) has discussed with respect to extraction structures, this would mean that one needs seven versions of fronted verbs to handle the seven cases in (32) and (33), which would correspond to the seven phrasal types that would have to be stipulated in phrasal approaches. But there is a way out of this: one can assume one lexical item with underspecified semantics. HPSG makes it possible to use implicational constraints referring to a structure in which an item occurs. Depending on the context, the semantics contributed by a specific item can be further specified. Figure 7 shows the construction-based and lexical-rule-based analyses in the abstract for comparison. In the construction-based analysis, the daughters

(a) Phrasal construction (b) Unary construction and implication

Figure 7: Construction-based, phrasal approach and approach with implicational constraint

contribute x and y as semantic values and the whole construction adds the construction meaning . In the lexical-rule- or unary-projection-based analysis, the lexical rule/unary projection adds the and the output of the rule is combined with the other daughter without any contribution by a specialized phrasal construction. Now, implicational constraints can be used to determine the exact contribution of the lexical item (Müller 2015b). This is shown with the example of a question in Figure 8. The implication says: when the configuration has the form that there is a question pronoun in the left daughter, the projection resulting

Figure 8: Implication for interrogative sentences

### Stefan Müller

from the combination of the output of the lexical rule with the VP selected by the initial verb gets question semantics. Since HPSG represents all linguistic information in the same attribute value matrix (AVM), such implicational constraints can refer to intonation as well and hence, implications for establishing the right semantics for V1 questions (32a) vs. V1 conditionals (32c) can be formulated.<sup>12</sup>

# **6 Constituent order domains and linearization**

There is an interesting extension to standard HPSG that opens up possibilities for analyses that are quite different from what is usually done in theoretical linguistics: Mike Reape (1991; 1992; 1994) working on German suggested formal tools that allow for the modeling of discontinuous constituents.<sup>13</sup> His original motivation was to account for scrambling of arguments of verbs forming verbal complexes, but this analysis was superseded by Hinrichs and Nakazawa's analysis (Hinrichs & Nakazawa 1989; 1994) since purely linearization-based approaches are unable to account for agreement and the so-called remote passive (Kathol 1998: Section 5.1, Section 5.2; Müller 1999: Chapter 21.1). Nevertheless, Reape's work was taken up by others and was used for analyzing German (Kathol & Pollard 1995; Kathol 2000; Müller 1995; 1996; 2004; Wetta 2011; 2014). As will

(i) Kim Kim [kennt knows und and liest] reads das the Buch. book 'Kim knows and reads the book.'

The unary schema applies to the conjunction of the two verbs. However, the situation is different for examples like (ii):

(ii) Kim Kim [kennt knows [die the Schallplatte record \_]] und and [liest reads [das the Buch book \_]]. 'Kim knows the record and reads the book.'

The selection of the verbless verb phrase takes place in the conjuncts, but the semantics of the clause is determined at the top-most level when *Kim* is combined with the coordinated structure. It has to be made sure that information about the syntactic combination of verb-initial verb, about morphological information (imperative vs. indicative) and intonation is available at the coordinated structure. This information will be affected by the implicational constraint and is inserted at a place where it scopes over the coordination relation.

An alternative to the underspecification + implicational constraints account would be to add the semantics contributed by clause types via a unary rule applying to the complete clause as in Ginzburg & Sag (2000: 266–267).

<sup>12</sup>Note that coordination examples like (i) do not pose a problem:

<sup>13</sup>See also Wells (1947: 105–106), Dowty (1996), and Blevins (1994) for proposals assuming discontinuous constituents in other frameworks.

#### 10 Constituent order

be discussed below in Section 6.4, there are reasons for abandoning linearization-based analyses of German that assume discontinuous constituents (Müller 2005b; 2021b: Chapter 6) but constituent order domains still play a role in analyzing ellipsis (Nykiel & Kim 2021: 877, Chapter 19 of this volume) and coordination (Yatabe 2001; Crysmann 2003; Beavers & Sag 2004; Yatabe & Tam 2021; Abeillé & Chaves 2021: 749, 757, Chapter 16 of this volume). Bonami, Godard & Marandin (1999) show that complex predicate formation does not account for subject-verb inversion in French and suggest a domain-based approach. Bonami & Godard (2007), also working on French, propose an analysis of sentential adverbs within a domain-based approach.

# **6.1 A special representational layer for constituent order**

The technique that is used to model discontinuous constituents in frameworks like HPSG goes back to Mike Reape's work on German (1991; 1992; 1994). Reape uses a list called DOMAIN to represent the daughters of a sign in the order in which they are pronounced or written. (34) shows an example in which the DOM value of a headed-phrase is computed from the DOM value of the head and the list of non-head daughters.

$$\text{(34)}\quad head \text{-}phase \Rightarrow \begin{vmatrix} \text{HEAD-DTR} \text{[DOM]} & \boxed{\text{D}} \\ \text{NON-HEAD-DTRS} & \boxed{\text{D}} \\ \text{DOM} & \boxed{\text{D}} \odot{\text{E}} \end{vmatrix}$$

The symbol ' ' stands for the *shuffle* relation. *shuffle* relates three lists A, B and C iff C contains all elements from A and B and the order of the elements in A and the order of the elements of B is preserved in C. (35) shows the combination of two lists with two elements each:

(35) h *a, b* i h *c, d* i = h *a, b, c, d* i ∨ h *a, c, b, d* i ∨ h *a, c, d, b* i ∨ h *c, a, b, d* i ∨ h *c, a, d, b* i ∨ h *c, d, a, b* i

The result is a disjunction of six lists. *a* is ordered before *b* and *c* before *d* in all of these lists, since this is also the case in the two lists h *a, b* i and h *c, d* i that have been combined. But apart from this, *a* and *b* can be placed before, between, or after *c* and *d*.

### Stefan Müller

On the linearization-based approach, every word comes with a domain value that is a list that contains the word itself:

(36) Domain contribution of single words, here *gibt* 'gives': 

1 PHON *gibt* SYNSEM *…* DOM 1 

The description in (36) may seem strange at first glance, since it is cyclic, but it can be understood as a statement saying that *gibt* contributes itself to the items that occur in linearization domains.

The constraint in (37) is responsible for the determination of the PHON values of phrases:

(37) *phrase* ⇒ PHON 1 ⊕ *…* ⊕ n DOM -PHON 1 , …, - PHON n 

It states that the PHON value of a sign is the concatenation of the PHON values of its DOMAIN elements. Since the order of the DOMAIN elements corresponds to their surface order, this is the obvious way to determine the PHON value of the whole linguistic object.

Figure 9 shows how this machinery can be used to license binary branching structures with discontinuous constituents in the sentence *dass dem Kind ein Mann das Buch gibt* 'that a man gives the child the book'. Words or word sequences that are separated by commas stand for separate domain objects, that is, h *das, Buch* i contains the two objects *das* and *Buch* and h *das Buch, gibt* i contains the two objects *das Buch* and *gibt*. The important point to note here is that the arguments in the tree are combined with the head in the order accusative, dative, nominative, although the elements in the constituent order domain (i.e. in the list of DOMAIN elements and in the surface sentence) are realized in the order dative, nominative, accusative, rather than nominative, dative, accusative, which is what one might expect based on the order in which they are combined in the tree. This is possible since the formulation of the computation of the DOM value using the shuffle operator allows for discontinuous constituents. The node for *dem Kind das Buch gibt* 'the child the book gives' is discontinuous: *ein Mann* 'a man' is inserted into the domain between *dem Kind* 'the child' and *das Buch* 'the book'. This is more obvious in Figure 10, which has a serialization of NPs that corresponds to their order.

Figure 9: Analysis of *dass dem Kind ein Mann das Buch gibt* 'that a man gives the child the book' with binary branching structures and discontinuous constituents. The tree shows the order of combination, which does not correspond to the linearization of the DOMAIN objects.

Figure 10: Analysis of *dass dem Kind ein Mann das Buch gibt* 'that a man gives the child the book' with binary branching structures and discontinuous constituents, more clearly showing the discontinuity

### Stefan Müller

# **6.2 Absolutely free**

While German is more striking than English in terms of constituent order, languages like Warlpiri are even more so, since they have much freer constituent order. In Warlpiri the auxiliary has to be in first or in second position (Laughren 1989: 322; Simpson 1991: 69, 99), but apart from this, even parts of what are noun phrases in German and English can appear separated from each other. For example, the two parts of the NP *Kurdujarrarlu witajarrarlu* 'child small' may appear discontinuously since they are marked with the same case (Simpson 1991: 257):

(38) Kurdu-jarra-rlu child-DU-ERG ka-pala PRS-3DU.SBJ maliki dog.ABS wajili.pi-nyi chase-NPST wita-jarra-rlu. small-DU-ERG 'Two small children are chasing the dog.' or 'Two children are chasing the dog and they are small.'

Donohue & Sag (1999) develop an analysis for this that simply liberates domain elements and inserts them into the next higher domain. (39) shows how this is formalized:

(39) *liberating-phrase* ⇒

 DOM <sup>0</sup> <sup>1</sup> *. . .* HEAD-DTR - DOM <sup>0</sup> NON-HEAD-DTRS -DOM <sup>1</sup> , *. . .*, - DOM 

Rather than inserting the entire daughters into the domain of the mother as in (34), the DOM values of the daughters are shuffled into the domain of the mothers. So instead of having the NPs in the same domain as the verb as in the German example in the previous section, one has all the parts of NPs in the next higher domain. Hence, a single nominal element being placed in front of the auxiliary in second position is explained without difficulty. Figure 11 shows Donohue & Sag's (1999) analysis of a version of (38) with the VP constituents *maliki wajilipinyi* 'dog chase' serialized after *witajarrarlu* 'small'. Here *kurdujarrarlu* 'child' and *witajarrarlu* 'small' form an NP. They contribute two independent domain objects ( 1 and 3 ) to the domain of the mother. The second element in this domain has to be the auxiliary ( 2 ), 1 is realized initially and 3 follows the auxiliary.

We have seen so far an analysis that inserts complete objects into the domain of the mother (the analysis of German) and an analysis that inserts all domain objects of objects into the domain of the mother (the analysis of Warlpiri). In the next subsection I look at an intermediate case, so-called *partial compaction*.

10 Constituent order

# **6.3 Partial compaction (extraposition)**

Kathol & Pollard (1995) develop an analysis of extraposition that is a mix of the strategies discussed in the two previous subsections: most of one NP object is inserted into the domain of the mother as a single object, while only those parts that are extraposed are liberated and inserted as individual domain objects into the domain of the mother.<sup>14</sup> Kathol & Pollard's analysis of (40) is given in Figure 12. 15


*einen Hund, der Hunger hat* 'a dog who is hungry' consists of three domain objects: *einen* 'a', *Hund* 'dog', and *der Hunger hat* 'who is hungry'. The two initial ones are inserted as one object (the NP *ein Hund* 'a dog') into the higher domain and the relative clause is liberated. While the formation of the new domain at the

<sup>14</sup>This analysis of extraposition is not the only option available in HPSG. I explain it here since it shows the flexibility of the domain approach. The more common analysis of extraposition is one that is parallel to the SLASH-based approach to extraction that is explained in Borsley & Crysmann (2021), Chapter 13 of this volume. Since constraints regarding locality differ for fronting to the left and extraposition to the right, a different feature is used (EXTRA). See Keller (1995) and Müller (1999: Section 13.2) for discussion. More recent approaches assume the projection of semantic indices (Kiss 2005) to be able to solve puzzles like Link's (1984) hydra sentences and even more recent proposals mix index projection and EXTRA projection (Crysmann 2013).

<sup>15</sup>The figure is taken over from Kathol & Pollard. Words in italics are the object language. Part of speech or category labels are provided at the top of AVMs.

Stefan Müller

$$
\begin{bmatrix}
\text{VP} & \text{(NP} & \text{[NP} & \text{[NP} & \text{Hound}] \\
\text{DOM} \boxed{\text{[}\boxed{\text{[}}\text{enben Hund]} & \text{[}\text{fähtern]} & \text{[}\text{ExtraRA} & \text{]}
\end{bmatrix}
\begin{bmatrix}
\text{REL-S} \\
\text{BERTa} \\
\text{der Hungen hat}
\end{bmatrix}
\end{bmatrix}
$$

$$
\begin{bmatrix}
\text{NP} & \text{(DET)} & \text{[N]} \\
\text{DOM} \left\langle \begin{bmatrix}
\text{DET} \\
e \text{inen} \end{bmatrix}, \begin{bmatrix}
\text{N} & \text{RE-S} & & \\
\text{ExtraRA} & & \\
\text{der Hungen hat}
\end{bmatrix}
\end{bmatrix}
\end{bmatrix}
\begin{bmatrix}
\text{V} & \text{[}\text{DOM} \boxed{\text{[}\boxed{\text{[}}\text{H}\text{H}\\_5\text{H}]}
\end{bmatrix}
\begin{bmatrix}
\text{V} \\
\text{DOM} \boxed{\text{[}\boxed{\text{[}}\text{H}\\_5\text{H}]}
\end{bmatrix}
\end{bmatrix}
$$

$$
\begin{bmatrix}
\text{DET} \\
e \text{inen}
\end{bmatrix}
\quad
\begin{bmatrix}
\text{N} & \text{(N} & \text{[}\text{N} & \text{E}\text{L}\\_5\text{H}]
\end{bmatrix}
\begin{bmatrix}
\text{REL-S} \\
\text{H}\,\text{und} & \text{[}\text{G}\,\text{H}\text{H}\underline{\text{e}}\,\text{A}
\end{bmatrix}
\end{bmatrix}
$$

$$
\begin{bmatrix}
\text{N} \\
\text{H}\,\text{und}
\end{bmatrix}
\quad
\begin{bmatrix}
\text{REL-S} \\
\text{B}\,\text{H}\,\text{and}
\end{bmatrix}
\quad
\begin{bmatrix}
\text{REL-S} \\
\text{B}\,$$

Figure 12: Analysis of extraposition via partial compaction of domain objects according to Kathol & Pollard (1995: 178)

mother node is relatively straightforward in the cases discussed so far, a complex relational constraint is needed to split the relative clause ( 3 ) from the other domain objects and construct a new domain object that has the determiner and the noun as constituents ( 2 ). Kathol & Pollard have a relational constraint called *compaction* that builds new domain objects for insertion into higher domains. *partial compaction* takes an initial part of a domain and forms a new domain object from this, returning the remaining domain objects for separate insertion into the higher domain. Due to space limitations, this constraint will not be discussed here, but see Müller (1999: 244) for a refined version of Kathol & Pollard's constraint. The effect of partial compaction in Figure 12 is that there is a new object 2 and a list containing the remaining objects, in the example h 3 i. A list containing the new object h 2 i and the list containing the remaining objects h 3 i

#### 10 Constituent order

are shuffled with the domain list of the head 4 . Since the relative clause is now in the same domain as the verb, it can be serialized to the right of the verb.

This subsection showed how examples like (40) can be analyzed by allowing for a discontinuous constituent consisting of an NP and a relative clause. Rather than liberating all daughters and inserting them into the domain of the mother node as in the Warlpiri example, determiner and noun form a new object, an NP, and the newly created NP and the relative clause are inserted into the domain of the mother node. This explains why determiner and noun have to stay together while the relative clause may be serialized further to the right.

# **6.4 Problems with order domains**

Constituent order domains may seem rather straightforward since linearization facts can be handled easily. I assumed constituent order domains and discontinuous constituents for German myself for over a decade (Müller 1995; 2004). However, there are some problems that seem to suggest that a traditional GBlike head-movement approach is the better alternative. In what follows I want to discuss just two problematic aspects of linearization approaches: spurious ambiguities and apparently multiple frontings.

### **6.4.1 Partial fronting and spurious ambiguities**

Kathol (2000)suggests an analysis of German clause structure with binary branching structures in which all arguments are inserted into a linearization domain and can be serialized there in any order, provided no LP rule is violated. Normally one would have the elements of the COMPS list in a fixed order, combine the head with one element from the COMPS list after another, and let the freedom in the DOM list be responsible for the various attested orders. So, both sentences in (41) would have analyses in which the verb *erzählt* 'tells' is combined with *Geschichten* 'stories' first and then *Geschichten erzählt* 'stories tells' is combined with *den Wählern* 'the voters'. Since the verb and all its arguments are in the same linearization domain they can be ordered in any way, including the two possibilities in (41):


b. weil because er he Geschichten stories den the Wählern voters erzählt tells

### Stefan Müller

The problem with this approach is that examples like (42) show that grammars have to account for fronted combinations of the verb and any of its objects to the exclusion of the other:

	- b. Den the Wählern voters erzählen tell sollte should man one diese these Geschichten stories nicht. not

Kathol (2000: Section 8.9) accounts for examples like (42) by relaxing the order of the objects in the valence list. He uses the shuffle operator , which was explained in (35) above, in the valence representation:

(43) h NP[*nom*] i ⊕ (h NP[*dat*] i h NP[*acc*] i)

This solves the problem with examples like (42) but it introduces a new one: sentences like (41) now have two analyses each. One is the analysis we had before and another one is the one in which *den Wählern* 'the voters' is combined with *erzählt* 'tells' first and the result is then combined with *Geschichten* 'stories'. Since both objects are inserted into the same linearization domain, both orders can be derived. So we have too much freedom: freedom in linearization and freedom in the order of combination. The proposal that I suggested in Müller (2005a: Section 2.1; 2021b: Section 2.2.1) and which is implemented in the schema in (19) above has just the freedom in the order of combination and hence can account for both (41) and (42) without spurious ambiguities.

# **6.4.2 Surface order, clause types, fields within fields, and empty elements**

Kathol (2001) develops an analysis of German that uses constituent order domains and determines the clause types on the basis of the order of elements in such domains. He suggests the topological fields *1*, *2*, *3*, and *4*, which correspond to the traditional topological fields *Vorfeld* 'prefield', *linke Satzklammer* 'left sentence bracket', *Mittelfeld* 'middle field', *rechte Satzklammer* 'right sentence bracket'. Domain objects may be assigned to these fields, and they are then ordered by linearization constraints stating that objects assigned to *1* have to precede objects of type *2*, type *3*, and type *4*. Objects of type *2* have to precede type *3*, and type *4* and so on. For the *Vorfeld* and the left sentence bracket, he stipulates uniqueness constraints saying that at most one constituent may be of this type. This can be stated in a nice way by using the linearization constraints in (44):

10 Constituent order

$$\begin{aligned} \text{(44)} \quad & \text{a.} \quad 1 < 1\\ & \text{b.} \quad 2 < 2 \end{aligned}$$

This trick was first suggested by Gazdar et al. (1985: 55, Fn. 3) in the framework of GPSG and it works because, if there were two objects of type *1*, then each one would be required to precede the other one, resulting in a violation of the linearization constraint. So in order to avoid such constraint violation there must not be more than one *1*.

Kathol (2001: 58) assumes the following definition for V2 clauses:

$$\text{(45)}\quad V2\text{-}clause\Rightarrow\begin{bmatrix}\text{S[fin]}\\ \_{\text{DOM}}\quad \left\langle \text{[1]}, \begin{bmatrix} 2\\ \text{V[fin]} \end{bmatrix}, \dots \right\rangle\end{bmatrix}$$

This says that the constituent order domain starts with one element assigned to field *1*, followed by another domain object assigned to field *2*. While this is in accordance with general wisdom about German, which is a V2 language, there are problems for entirely surface-based theories: German allows for multiple constituents in front of the finite verb. (46) shows some examples:

	- b. [Dem the.DAT Saft] juice [eine a.ACC kräftige strong Farbe] color geben give Blutorangen.<sup>17</sup> blood.oranges 'Blood oranges give the juice a strong color.'

Müller (2003a) extensively documents this phenomenon. The categories that can appear before the finite verb are almost unrestricted. Even subjects can be fronted together with other material (Bildhauer & Cook 2010: 72; Bildhauer 2011: 371). The empirical side of these apparent multiple frontings was further examined in the Collective Research Center 632, Project A6, and the claim that only constituents that are dependents of the same verb can be fronted together (Fanselow 1993: 66; Hoberg 1997: 1634) was confirmed (Müller 2021b: Chapter 3). A further

<sup>16</sup>Der deutsche Straßenverkehr, 1968, Heft 6, p. 210, quoted after Neumann (1969: 224). See also Beneš (1971: 162).

<sup>17</sup>Bildhauer & Cook (2010: 69) found this example in the *Deutsches Referenzkorpus* (DeReKo), hosted at Institut für Deutsche Sprache, Mannheim: http://www.ids-mannheim.de/kl/ projekte/korpora, 2021-03-21.

### Stefan Müller

insight is that the linearization properties of the fronted material (NPs, PPs, adverbs, adjectives) correspond to the linearization properties they would have in the *Mittelfeld*. The example in (47) is even more interesting. It shows that there can be a right sentence bracket (the particle *los*) and an extraposed constituent (something following the particle: *damit*) before the finite verb (*geht* 'goes'):


As far as topology is concerned, this sentence corresponds to sentences with VP fronting and extraposition like the one in (48) discussed in Reis (1980: 82).

(48) [Gewußt, known daß that du you kommst,] come haben have wir we schon PART seit since langem. long (German) 'We have known for a while that you are coming.'

In (48) *gewußt, dass du kommst* 'known that you come' forms a VP in which *gewußt* is the right sentence bracket and *daß du kommst* 'that you come' is extraposed. We have the same situation in (47) with *los* 'off' and *damit* 'there.with', except that one would not want to claim that *damit* 'there.with' depends on *los* 'off'.

In Kathol's system, *los* would be of type *4* and *damit* would have to be of type *5* (an additional type for extraposed items). Without any modification of the general system, we would get a *4* and a *5* ordered before a *2* (a right sentence bracket and a postfield preceding the left sentence bracket), something that is ruled out by Kathol's linearization constraints.

Müller (2002b), still working in a domain-based framework, developed an analysis assuming an empty verbal head to explain the fact that the fronted constituents have to depend on the same verb and that there is a separate topological area that is independent of the remaining clause. So, *los* and *damit* are domain objects within a larger domain object placed in the prefield. Wetta (2011)suggests an analysis in which two or more constituents are compacted into one domain object, so *los* and *damit* would form one object that is inserted into the domain containing the finite verb. However, this begs the question of what kind of object it is that is formed. Section 6.3 dealt with partial compaction of NPs. Some of the elements from an NP domain were liberated and other elements were fused into

<sup>18</sup>taz, 01.03.2002, p. 8.

#### 10 Constituent order

a new object that had the same category as the object containing all material, namely NP. But the situation with examples like (46) and (47) is quite different. We have a particle and a pronominal adverb in (47) and various other combinations of categories in the examples collected by Müller (2003a; 2005c; 2013a) and Bildhauer (2011). It would not make sense to claim that the fronted object is a particle or a pronominal adverb. Note that it is not an option to leave the category of the fronted object unspecified, since HPSG comes with the assumption that models of linguistic objects are total, that is, maximally specific (King 1999, see also Richter (2021), Chapter 3 of this volume). Leaving the category and valence properties of the item in the prefield unspecified would make such sentences infinitely ambiguous. Of course Wetta could state that the newly created object is a verbal projection, but this would just be stating the effect of the empty verbal head with a relational constraint, which I consider less principled than positing an empty element.

However, the empty verbal head that I stated as part of a linearization grammar in 2002 comes as a stipulation, since its only purpose in the grammar of German was to account for apparent multiple frontings. Müller (2005b; 2021b) drops the linearization approach and assumes head-movement instead. The empty head that is used for accounting for the verb position in German can also be used to account for apparent multiple frontings. The analysis is sketched in (49):

(49) a. [VP [Zum to.the zweiten second Mal] time [die the Weltmeisterschaft] world.championship \_<sup>V</sup> ] errang won Clark Clark 1965 1965 \_ \_ . (German) b. [VP *Los* off \_<sup>V</sup> damit] there.with *geht* goes es it schon PRT am on 15. 15. April April \_ \_ .

'The whole thing starts on the 15th April.'

Space precludes going into all the details here, but the analysis treats apparent multiple frontings parallel to partial verb phrase frontings. A lexical rule is used for multiple frontings which is a special case of the head-movement rule that was discussed in Section 5.1. So, apparent multiple frontings are analyzed with means that are available to the grammar anyway. This analysis allows us to keep the insight that German is a V2 language and it also gets the same-clause constraint and the linearization of elements right. As for (49b): *los damit* 'off there.with' forms a verbal constituent placed in the *Vorfeld* and within this verbal domain, we have the topological fields that are needed: the right sentence bracket for the verbal particle and the verbal trace and the *Nachfeld* for *damit* 'there.with'. See Müller (2005a,b; 2021b) for details.

### Stefan Müller

This chapter so far has discussed the tools that have been suggested in HPSG to account for constituent order: flat vs. binary branching structures, linearization domains, head-movement via DSL. I showed that analyses of German relying on discontinuous constituents and constituent order domains are not without problems and that head-movement approaches with binary branching and continuous constituents can account for the data. I also demonstrated in Section 6.2 that languages like Warlpiri that allow for much freer constituent order than German can be accounted for in models allowing for discontinuous constituents. The following section discusses a proposal by Bender (2008) that shows that even languages like Australian free constituent order languages can be handled without discontinuous constituents.

# **7 Free constituent order languages without order domains**

Bender (2008) discusses the Australian language Wambaya and shows how phenomena parallel to those treated by Donohue & Sag (1999) can be handled without discontinuous constituents. Bender assumes that all arguments of a head are projected to higher nodes even when they are combined with the head; that is, arguments are not canceled off from valence lists. See also Meurers (1999), Przepiórkowski (1999) and Müller (2008) for earlier non-cancellation approaches.<sup>19</sup> Example (38) from Section 6.2 can be recast with continuous constituents as shown in Figure 13. The figure shows that arguments are not removed from the valence representation after combination with the head. Rather they are marked as satisfied: /<sup>1</sup> . Since they are still in the representation, schemata may refer to them. Bender suggests a schema that identifies the MOD value of an element that could function as an adjunct in a normal head-adjunct structure with an element in the valence representation. In Figure 13, the MOD value of the second ergative nominal *wita-jarra-rlu* 'small' is identified with an argument of the auxiliary verb ( 1 ). The adjunct hence has access to the referential index of the argument and it is therefore guaranteed that both parts of the noun phrase refer to the same discourse referent. The NP for *kurdu-jarra-rlu* is combined with the projection of the auxiliary to yield a complete sentence. Since 1 does not only contain the semantic index and hence information about number (the dual) but also case information, it is ensured that distributed noun phrases have to bear the same case. Since information about all arguments are projected along the head path, 2 would also be available for an adjunct referring to it. So in the place of *wita-*

<sup>19</sup>Higginbotham (1985: 560) and Winkler (1997: 239) make similar suggestions with regard to the representation of theta roles.

Figure 13: Analysis of free constituent order in Warlpiri using non-cancellation

*jarra-rlu* 'small-DU-ERG' we could also have another adjunct referring to *maliki* 'dog.ABS'. This shows that even languages with constituent order as free as Australian languages can be handled within HPSG without assuming discontinuous constituents.

# **8 Summary**

A major feature of constraint-based analyses is that when no constraints are stated, there is freedom. The chapter discussed the order of head and adjunct: if the order of head and adjunct is not constrained, both orders are admitted.

This chapter explored general approaches to constituent order in HPSG. On the one hand, there are approaches to constituent order that assume flat constituent structure, allowing permutation of daughters as long as no LP constraint is violated. On the other hand, there are approaches assuming binary branching structures. Approaches that assume flat structures can serialize the head to the left or to the right or somewhere between other daughters in the structure. Approaches assuming binary branching have to use other means. One possibility is "head movement", which is analyzed as a series of local dependencies by passing information about the missing head up along the head path. The alternative to head movement is linearization of elements in special linearization

### Stefan Müller

domains, allowing for discontinuous constituents. I showed that there are reasons for assuming head-movement for German and how even languages with extremely free constituent order can be analyzed without assuming discontinuous constituents.

# **Acknowledgments**

I thank Anne Abeillé, Bob Borsley, Doug Ball and Jean-Pierre Koenig for very detailed and very useful comments. I thank Elizabeth Pankratz for comments and for proofreading.

# **References**


#### 10 Constituent order


### Stefan Müller


#### 10 Constituent order

Linguistics, University of Essex. 31–57. http : / / repository . essex . ac . uk / 127/ (10 February, 2021).


### Stefan Müller


10 Constituent order


### Stefan Müller

*zur syntaktischen, semantischen und pragmatischen Fundierung* (Studien zur deutschen Grammatik 15), 75–153. Republished as Höhle (2019). Tübingen: originally Gunter Narr Verlag now Stauffenburg Verlag.


10 Constituent order

Tübingen. http://www.sfs.uni-tuebingen.de/sfb/reports/berichte/132/132abs. html (10 February, 2021).


### Stefan Müller

*International Conference on the Practical Application of Prolog*, 263–277. London: Practical Application Company.


#### 10 Constituent order


#### Stefan Müller

Notes in Computer Science 8036), 69–89. Berlin: Springer Verlag. DOI: 10.1007/ 978-3-642-39998-5\_5.


#### 10 Constituent order

*constructional aspects of linguistic explanation* (Studies in Constraint-Based Lexicalism 1), 231–245. Stanford, CA: CSLI Publications.


### Stefan Müller


10 Constituent order


# **Chapter 11**

# **Complex predicates**

# Danièle Godard

Université de Paris, Centre national de la recherche scientifique (CNRS)

# Pollet Samvelian

Université Sorbonne Nouvelle

Complex predicates are constructions in which a head attracts arguments from its predicate complement. Auxiliaries, copulas, predicative verbs, certain control or raising verbs, perception verbs, causative verbs and light verbs can head complex predicates. This phenomenon has been studied in HPSG in different languages, including Romance and Germanic languages, Korean and Persian. They each illustrate different aspects of complex predicate formation. Romance languages show that argument inheritance is compatible with different phrase structures. German, Dutch and Korean show that argument inheritance can induce different word order properties, and Persian shows that a complex predicate can be preserved by a derivation rule (nominalization from a verb), and, most importantly in Persian, which has relatively few simplex verbs, that light verb constructions are used to turn a noun into a verb.

# **1 Introduction**

Words such as verbs, nouns, adjectives or prepositions typically denote predicates that are associated with arguments, and those arguments are typically syntactically realized as the subject, complements or specifier of those words. For instance, a verb such as *to eat* has two arguments, realized as its subject and its object, and understood as agent (the eater) and patient (what is eaten). Usually, arguments are associated with just one predicate (one word). However, in constructions called *complex predicates*, two or more predicates associated with

Danièle Godard & Pollet Samvelian. 2021. Complex predicates. In Stefan Müller, Anne Abeillé, Robert D. Borsley & Jean- Pierre Koenig (eds.), *Head-Driven Phrase Structure Grammar: The handbook*, 419–488. Berlin: Language Science Press. DOI: 10.5281/zenodo.5599838

#### Danièle Godard & Pollet Samvelian

words behave as if they formed just one predicate, while keeping their status as different words in the syntax. For instance, tense auxiliaries in Romance languages form a complex predicate with the participle which follows, but they are different words, since they can be separated by an adverb, as in French *Lucas a rapidement lu ce livre* 'Lucas has quickly read this book'; see (1). Several properties set apart complex predicates from ordinary predicates, and those properties can differ from one language to another. In HPSG, complex predicates are analyzed as constructions in which one predicate, the head, "attracts" the arguments of the other, that is, the syntactic arguments of one word or predicate include the syntactic arguments of another word or predicate. This chapter is devoted to the various analyses of complex predicates that have been proposed within HPSG and some of the cross-linguistic variation in the behavior of complex predicates, focusing on French, German, Korean and Persian.

# **2 What are complex predicates?**

The term *complex predicate* does not have a universally accepted definition. In this section, we explain how it is used in HPSG to name a syntactic phenomenon where two (or more) words form what appears to be a single predicate because the head is attracting the (syntactic) arguments of its complement. We then mention the work that has been done in different languages on this aspect of natural language grammars and the constructions in which it manifests itself. Finally, we contrast our use of the term *complex predicates* with other uses of the term and with related phenomena, in particular serial verb constructions (SVCs).

# **2.1 Definition**

In the HPSG tradition, a complex predicate is composed of two or more words, each of which is itself a predicate. By predicate, we mean either a verb or a word of a different category (noun, adjective, preposition, particle) which is associated with an argument structure. A complex predicate is a construction in which the head attracts the arguments of the other predicate, which is its complement: the arguments selected by the complement predicate "become" the arguments of the head (Hinrichs & Nakazawa 1989; 1994; 1998). The phenomenon is called *argument attraction*, *argument composition*, *argument inheritance* or *argument sharing*.

To take an example, tense auxiliaries and the participle in Romance languages are two different words, since they can be separated by adverbs, as in the French

#### 11 Complex predicates

examples in (1), but the two verbs belong to the same clause, and, more precisely, the syntactic arguments belong to one argument structure. We admit that the property of monoclausality can manifest itself differently in different languages (Butt 2010: 57–59). In the case of Romance auxiliary constructions, the first verb (the auxiliary) hosts the clitics which pronominalize the arguments of the participle: corresponding to the NP complement *son livre* 'his book' in (1a), the pronominal clitic *l(e)* is hosted by the auxiliary *a* 'has' in (1b) and (1c). This contrasts with the construction of a control verb such as *vouloir* 'to want', where the clitic corresponding to the argument of the infinitive is hosted by the infinitive, as in (2) (from Abeillé & Godard 2002: 406):


This approach to complex predicates goes back to Relational Grammar (Aissen & Perlmutter 1983): although formalized in a different way, their analysis of causative constructions in Romance languages relies on such argument attraction, under the name of *clause union*. Similarly, in Lexical Functional Grammar, Andrews & Manning (1999) speak of complex predicates as building a domain of

<sup>1</sup>Possible in an earlier stage of French.

#### Danièle Godard & Pollet Samvelian

grammatical relations sharing. It is also present in Categorial Grammar (Geach 1970), with complex categories whose definition takes into account the nature of the argument they combine with and the operation of function attraction. In particular, Kraak (1998: 301) accommodates complex predicates by introducing a specific mode of combination called *clause union mode*, where two verbs (two lexical heads) are combined. But, in this account, there is no argument attraction in general, the mechanism being specifically defined in order to account for clitic climbing.

There are other definitions of complex predicates. The term has been used to describe the complex content of a word, when it can be decomposed. For instance, the verb *dance* has been analyzed as incorporating the noun *dance* and considered a "complex predicate" (Hale & Keyser 1997: 31, 41). In the sense adopted here, complex predicates involve at least two words, and are syntactic constructions. Closer to what we consider here to be complex predicates is the case of Japanese passive or causative verbs, illustrated in (3).

(3) tabe-rare-sasete-i-ta. eat-PASS-CAUS-PROG-PST (Japanese)

'(Someone) was causing (something) to be eaten.'

The causative morpheme adds a causer argument, and behaves as if it took the verb stem as its complement (more precisely, the verb stem with the passive morpheme, in this case), whose expected subject appears as the object of the causative verb. This operation is like argument attraction. However, it happens in the lexicon rather than in syntax: the elements in (3) are bound morphemes, and they form a word (Manning et al. 1999).<sup>2</sup> Thus, we do not consider causative verbs in Japanese to constitute complex predicates.

Complex predicates are sometimes given a semantic definition: the two elements together describe one situation. This may be appropriate for some complex predicates, such as light verb constructions (*to have a rest*, *to make a proposal*) (Butt 2010: 71–74). However, such a semantic definition does not coincide with the syntactic one. It is true that the head verb of a complex predicate tends to add tense, aspectual or modal information, while the other element describes a situation type. Thus, in (1), the two verbs jointly describe one situation, the auxiliary adding tense and aspect information. But the semantics of a complex predicate is not always different from that of ordinary verbal complements. Thus, there is no evident semantic distinction depending on whether the Italian restructuring

<sup>2</sup>Gunji (1999) proposes a dual representation of Japanese causatives, with a VP embedding structure as well as a monoclausal morphological and phonological structure.

#### 11 Complex predicates

verb *volere* 'to want' is the head of a complex predicate (4a) or not (4b), and the two verbs do not seem to describe just one situation (Monachesi 1998: 314).

(4) a. Anna Anna lo it vuole wants comprare. buy

(Italian)

'Anna wants to buy it.' b. Anna Anna Anna vuole vuole wants comprarlo. comprar-lo buy-it 'Anna wants to buy it.'

The same point is made for Hindi in Poornima & Koenig (2009: 289–297). They show that there exist two structures combining an aspectual verb and a main verb; in one of them, the aspectual verb is the head of a complex predicate while, in the other one, it is a modifier of the main verb. In more general terms, complex predicates show that syntax and semantics are not always isomorphic in a language. Thus, although the semantic definition of complex predicates may be useful for some purposes, we will ignore it here.

The distinction between complex predicates and serial verb constructions, for example the one illustrated in (5) (from Haspelmath 2016: 294), where both *sàán* and *rrá* are verbs, is not obvious (e.g. Andrews & Manning 1999; Haspelmath 2016). The main reason is that the constructions which have been dubbed SVCs are different in different languages; we agree with Andrews & Manning (1999) that they do not share a grammatical mechanism, but they do share more superficial tendencies, such as their resemblance to paratactic constructions due to the absence of marking of complementation or coordination, and they also involve more semantic relations than are usually associated with complementation or coordination.

(5) Òzó Ozo sàán jump rrá cross ógbà. fence (Edo) 'Ozo jumped over the fence.'

Accordingly, SVCs are not within the purview of complex predicates, and will not be studied in this chapter (but see Lee 2014).

# **2.2 Constructions involving complex predicates**

Complex predicates enter into a number of constructions across languages. They differ from ordinary constructions in different ways, depending on the construc-

#### Danièle Godard & Pollet Samvelian

tion, such as the position of pronominal clitics in Romance languages ("clitic climbing"), word order or special semantic combinations.

The following have been particularly studied in HPSG:


In this chapter, we examine some of these constructions which illustrate the different ways in which complex predicates differ from ordinary verbs.

# **3 The basic mechanism in HPSG: Argument attraction**

In HPSG, complex predicates are analyzed in the following way: one of the predicates is the head of the construction, and it attracts the syntactic arguments of the other predicate, that is, its complements and, possibly, its subject. We illustrate it with tense auxiliaries in French (Abeillé & Godard 1995; 2002).

#### 11 Complex predicates

In French, auxiliary constructions consist of a tense auxiliary (*avoir* 'to have' or *être* 'to be') followed by a past participle and its complements, as illustrated in (1) on p. 421. The auxiliary is the head. It bears inflectional affixes (for tense and person) like any other verb, and if the sentence is declarative, it is in the indicative form as expected; for example, the auxiliary in (1) has the form of a present indicative third person. The auxiliary also hosts pronominal clitics, as verbal heads in general do, as shown in (1b) and (1c). Moreover, it can be gapped alone, as (6a) shows, while the participle can only be gapped with the auxiliary, as illustrated by (6b) and (6c);<sup>3</sup> this is expected if the auxiliary is the head, since it behaves like *pense* 'think' in (6d), while the participle behaves like the infinitive in (6e) and (6f).

	- b. Lola Lola a has acheté bought des some pommes, apples et and Alice Alice (a has acheté) bought des some pêches.
		- peaches

'Lola has bought apples, and Alice (has bought) peaches.'

	- peaches

'Lola is thinking of buying apples, and Alice (is thinking of) picking peaches.'

e. Lola Lola pense thinks acheter buy des some pommes, apples et and Alice Alice (pense thinks acheter) buy des some pêches. peaches

'Lola is thinking of buying apples, and Alice (is thinking of picking) peaches.'

<sup>3</sup>Note that (6c) is acceptable with the possession verb *avoir*.

#### Danièle Godard & Pollet Samvelian

f. \* Lola Lola pense thinks cueillir pick des some pommes apples et and Alice Alice pense thinks des some pêches. peaches Intended: 'Lola is thinking of picking apples and Alice is thinking of (picking) peaches.'

The auxiliary construction in French is a complex predicate: the clitic corresponding to a complement of the participle is hosted by the auxiliary (it is said to "climb") as in (1b). Moreover, it occurs in bounded dependencies such as the infinitival complement of adjectives like *facile* 'easy' or *impossible* 'impossible', whose nominal complement is unexpressed, as in (7a); this unexpressed complement can be that of a participle (7c) but not that of an infinitive complement (7b). This follows if the unexpressed complement is in fact treated as the complement of the auxiliary.

	- b. \* Cette this technique technique est is impossible impossible à to réussir manage à to maîtriser master en in un one jour. day Intended: 'This technique is impossible to manage to master in one day.'
	- c. Cette this technique technique est is impossible impossible à to avoir have maîtrisé mastered en in un one jour. day 'This technique is impossible to have mastered in one day.'

These two properties (clitic climbing and occurrence in bounded dependencies) follow if the complements of the participle become those of *avoir* 'to have'. In fact, both clitic climbing and the dependency found in 'easy'/'impossible' constructions belong to the set of bounded dependencies. In addition, the tense auxiliary *avoir* 'to have' is a subject raising verb (see Abeillé 2021, Chapter 12 of this volume): the subject is selected by the participle and shared by the auxiliary. For instance, *Paul* is an agent in (1a) (*Paul a lu son livre*, 'Paul has read his book') because *lire* 'to read' requires an agent subject, and in e.g. *Il a fait froid* (lit. It has made cold, 'It [the weather] was cold'), the subject is the impersonal subject *il*, because that is the subject of the participle *fait froid*. Thus, the auxiliary *avoir* (like tense auxiliary *être* 'to be') is, in fact, a generalized raising verb: its whole argument structure is identified with that of the participle. A simplified description of subject raising verbs and tense auxiliaries is given in (8) (for the feature [LIGHT±], see Section 4).<sup>4</sup>

<sup>4</sup>⊕ stands for the relation *append* and simply concatenates two lists. For example, h a, b i ⊕ h c, d i = h a, b, c, d i.

11 Complex predicates

$$\begin{array}{l} \text{(8)} \quad \text{a. Ordinary subject raising verb:}\\ \begin{bmatrix} \text{ARG-ST} \begin{bmatrix} \Box \end{bmatrix} \oplus \left\langle \begin{bmatrix} \text{SUBJ} & \Box \\ \text{COMPS} \begin{pmatrix} \end{bmatrix} \right\rangle \oplus list \end{bmatrix} \\\ \text{b. Tense auxiliary as head of a complex predicate:} \\\ \begin{bmatrix} \text{ARG-ST} \begin{bmatrix} \Box \end{bmatrix} \oplus \left\langle \begin{bmatrix} \text{SUBJ} & \Box \\ \text{ARG-ST} \begin{bmatrix} \Box \oplus \box{\Box} \end{bmatrix} \right\rangle \oplus \begin{bmatrix} \Box \end{bmatrix} \end{array} \right\} \end{array}$$

The subject raising verb takes a saturated complement, which is described as the second element of the argument structure, expecting a subject 1 identified with the subject of the raising verb. The notation 1 instead of h 1 i indicates that this element may be absent: it is meant to accommodate subjectless verbs. In addition, the raising verb may have its own complements, noted here as *list*. On the other hand, the auxiliary is not only a subject raising verb, but takes as a complement a participle which has not combined with any complements.

The arguments of a word are made up of subject and complements. The relation between (expected) arguments and realized subject and complements is as in (9) (see Ginzburg & Sag 2000: 171; Bouma et al. 2001: 12). The arguments include the subject and the complements, but also a list of non-canonical elements (possibly empty; see below).<sup>5</sup>

(9) Argument Realization Principle (adapted from Ginzburg & Sag 2000: 171):

*word* ⇒ SUBJ 1 COMPS 2 *list*(*non-canonical*) ARG-ST 1 ⊕ 2 

In (10a), the participle *lu* 'read' selects the argument *son livre* 'her book', which is attracted by the auxiliary *a* 'has'. Accordingly, it is realized as the complement of the auxiliary *a*. The structure of the VP in (10a) is given in Figure 1.

	- b. Marie Marie l'a it has lu. read 'Mary has read it.'

<sup>5</sup>Ginzburg & Sag (2000: 170) state the following about : "Here ' ' designates a relation of contained list difference. If <sup>2</sup> is an ordering of a set <sup>2</sup> and <sup>1</sup> is a subordering of 2, then <sup>2</sup> <sup>1</sup> designates the list that results from removing all members of <sup>1</sup> from 2; if <sup>1</sup> is not a sublist of 2, then the contained list difference is not defined.

Danièle Godard & Pollet Samvelian

Figure 1: VP structure in French

Figure 2: Subtypes of *synsem*

Let us turn to pronominal clitics. Arguments are of type *synsem*, which can have different subtypes (Figure 2). Usually, these subtypes are not specified on lexemes, but they are on words occurring in sentences.

Romance clitics, illustrated by *l*(*e*) in (10b), are analyzed as affixes (*aff* ) on verbs, which correspond to arguments of the verb (Miller & Sag 1997). They belong to the argument structure of the participle, and are attracted by the auxiliary, although they are not realized as complements. In (10b) and Figure 3, the arguments of the auxiliary are the subject 1 , the participle 3 , and 2 ; 2 is typed as an affix, third person, masculine singular. It belongs to the argument structure, but not to the complement list of the auxiliary (see (9)).

We distinguish between *basic verbs* and *reduced verbs*, following Abeillé, Godard & Sag (1998). With basic verbs, the argument list is simply the concatenation

11 Complex predicates

Figure 3: Clitic climbing in French

of the subject and complements, while reduced verbs have at least one affix argument which belongs to the argument list, but not to the complement list. Such verbs are subject to a morphological rule which realizes this affixal argument as an affix, the so-called clitic pronoun *l*(*e*). Thus, in Figure 1, both the auxiliary *a* 'has' and the participle *lu* 'read' are basic verbs: the arguments tagged 3 and 2 are also complements. On the other hand, in Figure 3, the participle is a basic verb – argument 2 is typed as an affix, but is also a complement – while the auxiliary is a reduced verb: argument 2 is not a complement of the auxiliary, and the verb hosts the affix *l*(*e*). Note that the Argument Realization Principle (9) allows a verb to expect a complement typed as *affix*: it allows arguments to be non-canonical (among which affixes), but it does not force complements to be canonical. If the complement is typed as affix, it has to be attracted by a different head, or it is realized as an affix. In the latter case, the verb must be a *reduced verb*. This is not the case for the participle in Figure 3, which is a *basic verb*.

In French, past participles never host clitics, as we saw in (1c), which we assume to be a morphological property. But in Italian, past participles may host clitics, although never when they combine with the auxiliary. The specification that the participle complement of the auxiliary is a basic verb accounts for this property, because basic verbs are not the target of the morphological rule realizing the affixal argument as an affix. Although both verbs in Figure 3 have an affixal argument, one is a basic verb (the participle), the affixal argument being

also an expected complement, and the other is a reduced verb (the auxiliary), this affixal argument not being an expected complement.<sup>6</sup>

# **4 Different structures for complex predicates: Restructuring verbs and the copula in Romance languages**

In addition to tense auxiliaries, Romance languages have other cases of complex predicates that are headed by restructuring verbs, by the copula and other verbs taking predicative complements, and by certain causative and perception verbs. We focus here on restructuring verbs and the copula. An analysis of causative and perception verbs is proposed in Abeillé et al. (1995); Abeillé, Godard, Miller & Sag (1998); Abeillé & Godard (2010).

A comparison of the properties of constructions headed by restructuring verbs in different Romance languages illustrates an important aspect of the phenomenon: argument attraction is compatible with different syntactic structures. Restructuring verbs enter either a flat structure or a verbal complex (Monachesi 1998; Abeillé & Godard 2001a; 2010).<sup>7</sup> As for the copula, it differs from tense auxiliaries and restructuring verbs in two respects: its complement always behaves like a phrase, although it can be fully saturated for its complements, partially saturated or not saturated at all (Abeillé & Godard 2001b; 2002); and it has a uniform behavior and analysis across the Romance languages.

	- b. Jean Jean l'a it has acheté bought et and lu. read 'Jean bought and read it'

<sup>6</sup> It is worth noting that tense auxiliaries can take as complement a coordination of participles:

This may be seen as raising a difficulty for the analysis of their complement based on argument structure sharing, since argument structure characterizes words rather than phrases. However, coordinations of words are a special kind of phrases, since the conjuncts must share their argument structure. It is plausible that such coordinations inherit an argument structure from the conjuncts (for further discussion of coordination, see Abeillé & Chaves 2021, Chapter 16 of this volume).

<sup>7</sup>However, see recent work by Aguila-Multner & Crysmann (2020), who analyze French tense auxiliaries in terms of 'periphrasis', with a VP complement.

11 Complex predicates

# **4.1 Romance restructuring verbs as head of complex predicates**

Certain verbs in Romance languages, called *restructuring verbs*, exhibit two behaviors: either as ordinary verbs taking a VP complement or as heads of complex predicates (Rizzi 1982; Aissen & Perlmutter 1983). Restructuring verbs are modal, aspectual or movement verbs (such as *venire* 'to come', *andare* 'to go', *correre* 'to run', *tornare* 'to come back' in Italian). However, it must be kept in mind that this behavior is lexical: verbs which are close semantically may or may not be heads of complex predicates.

Several properties show that such verbs can head complex predicates (Monachesi 1998: 323–328). The first is clitic climbing, which is possible with restructuring verbs, though optional (while it is obligatory with tense auxiliaries). The examples in (11) all mean 'John wants to eat them' (examples from Abeillé & Godard 2010: 113). For each language, the first example illustrates the complex predicate, and the second one the VP complement construction, with the clitic downstairs.


Danièle Godard & Pollet Samvelian


The second property showing restructuring verbs' complex predicate status is the medio-passive or middle *si* construction, where the verb hosts the reflexive clitic *si* or *se* (12b) (depending on the language), and the subject corresponds to the object of the active construction (12a), with an interpretation close to that of middles in English. The construction is possible with restructuring verbs such as *potere* 'to be able to' (12c) and (12d) (see Monachesi 1998: 333–336), but not with verbs only taking an infinitival VP complement such as *parere* 'to appear' (12e) (examples (12d) and (12e) from Abeillé & Godard 2010: 122).

(Catalan)

	- b. Queste these camicie shirts si SI stirano iron facilmente. easily 'These shirts iron easily.'
	- c. Giovanni Giovanni può can stirare iron queste these camicie shirts facilmente. easily 'Giovanni can iron these shirts easily.'
	- d. Queste these camicie shirts si SI possono can stirare iron facilmente. easily 'These shirts can be ironed easily.'
	- e. \* Queste these camicie shirts si SI paiono appear stirare iron facilmente. easily Intended: 'These shirts appear to be ironed easily.'

The medio-passive verb alternates with a transitive verb: it is the result of a lexical rule, shown in (13), which takes a transitive verb like *stirare* as in (12a) to give a verb whose subject corresponds to the expected object of the transitive verb and which acquires a reflexive clitic noted as *a-aff* (realized as *si* or *se*) as in (12b) (Abeillé, Godard & Sag 1998: 31; Monachesi 1998).

(13) Medio-Passive Lexical Rule: - ARG-ST NP, NP[*acc*] ⊕ 1 ↦→ - ARG-ST NP , [*a-aff, acc*] ⊕ 1 

#### 11 Complex predicates

What is crucial here is that the input is a verb taking an accusative NP complement. Hence, a verb taking a VP complement like Italian *potere* 'to be able to' or *parere* 'to appear' cannot be the input, since it lacks an NP complement. On the other hand, the corresponding restructuring verb *potere* can be the input, since it inherits such a complement from the infinitive: the verb *potere* in (12c) inherits *queste camicie* 'these shirts' from *stirare* 'to iron', allowing it to be the input to rule (13), which gives the verb occurring in (12d).

The third relevant property of restructuring verbs is their acceptability in bounded dependencies, as illustrated in (7) for tense auxiliaries and (14) for restructuring verbs. (14b) (from Monachesi 1998: 341) relies on *cominciare* 'to begin' being a restructuring verb, while *promettere* 'to promise' is not (14c).

	- b. Questa this canzone song è is facile easy da to cominciare begin a to apprendere. learn 'This song is easy to begin to learn.'
	- c. \* Questa this canzone song è is facile easy da to promettere promise di to apprendere. learn Intended: 'This song is easy to promise to learn.'

The complement of adjectives such as 'easy' in Romance languages is a bounded dependency: they take an infinitival complement whose own expected complement (we analyze it as a null pronoun; see Figure 2) is coindexed with its subject (Abeillé, Godard & Sag 1998; Monachesi 1998).<sup>8</sup>

(15) HEAD *adjective* ARG-ST \* XP , VP VFORM *infinitive* MARKING *da* COMPS [*null-pro*, *acc*] ⊕ *list* + 

Complex predicates can occur in this construction because their head attracts the complement of their complement. Thus, in (14b), *cominciare* 'to begin' is expecting the same object as *apprendere* 'to learn', which is coindexed with the

<sup>8</sup>Forms such as *a*, *da* and *di*, which introduce infinitival complements in (14), are not analyzed as heads, but as markers, a part of speech which has the feature MARKING and whose value is specific to the form. Markers select the head with which they combine (for instance, *da* selects an infinitival VP in (14a)), and the feature is shared by the whole VP. Hence, the adjective *facile* 'easy' in Italian takes as a complement an infinitival VP [MARKING *da*].

#### Danièle Godard & Pollet Samvelian

subject of the copular construction, in the same way as *apprendere* is expecting an object in (14a).

Fourth and finally, the possibility of preposing the verbal complement of a verb which can take a VP complement or be the head of a complex predicate disappears when there is evidence of a complex predicate. For the sake of simplicity, we now concentrate on Italian and Spanish. The data in (16), with a preposed VP, contrast with those in (17) (both examples from Abeillé & Godard 2010: 132), where the head verb bears a clitic corresponding to the expected complement of the infinitive. Preposing of the verbal complement is associated with pronominalization (*lo*) in Italian (16a) but not in Spanish (16b), where it is more natural in contrastive contexts.


Hablar-le talk-to.her a to María María seguramente certainly quiere wants pero but no no not a a to su su her madre). madre mother (Spanish) 'Talk to Maria, certainly he wants to (but not to her mother).'

(Italian)

(17) a. \* Parlare, Parlare talk certamente certamente certainly glielo glie-lo to.him/her-it vuole. vuole wants (Italian) Intended: 'Talk to him, he certainly wants to.' b. \* Hablar, talk le to.him/her quiere wants (pero but no not mucho a.long tiempo). time (Spanish)

Intended: 'Talk to him/her he wants to (but not for a long time).'

We assume that restructuring verbs have two possible descriptions: as ordinary verbs taking an infinitival VP complement, or as heads of complex predicates. They are related by the Argument Attraction Lexical Rules given in (18) (adapted from Monachesi 1998: 331).<sup>9</sup>

<sup>9</sup>We leave aside the object control and object raising verbs (verbs of influence or perception verbs) which can also be the head of a complex predicate, and hence be the target of a similar lexical rule (Abeillé, Godard, Miller & Sag 1998; Abeillé & Godard 2010).

#### 11 Complex predicates

### (18) Argument attraction lexical rules for Romance restructuring verbs:

$$\begin{aligned} \text{a. } \begin{array}{l} \text{a. } \text{Subject control verbs:}\\ \begin{bmatrix} \text{HEAD} & \text{verb} \\ \text{^{\text{H}}\text{B}\text{d}-\text{sr}} \end{bmatrix} \begin{bmatrix} \text{verb} \\ \text{^{\text{V}}\text{F}\text{OMR} \\ \text{^{\text{V}}\text{S}\text{D}\text{d}} \end{bmatrix} \begin{bmatrix} \text{verb} \\ \text{^{\text{V}}\text{F}\text{OMR} \\ \text{^{\text{V}}\text{S}\text{D}} \end{bmatrix} \end{bmatrix} \end{^{\text{a}}} \leftrightarrow \\ \begin{bmatrix} \text{^{\text{A}}\text{B}\text{d}-\text{sr}} \left\{ \begin{bmatrix} \text{B}\text{d}\text{c-}\text{s} \text{vec} \\ \text{^{\text{A}}\text{B}\text{d}} \end{bmatrix} \begin{bmatrix} \text{basic-} \text{verb} \\ \text{^{\text{C}}\text{D}\text{d}} \end{bmatrix} \right\} \oplus \begin{bmatrix} \text{B}\text{d} \text{s} \\ \text{^{\text{B}}\text{D}\text{d}} \end{bmatrix} \end{aligned} \leftrightarrow \\ \begin{bmatrix} \text{^{\text{A}}\text{B}\text{d}-\text{s} \text{T}} \left\{ \begin{bmatrix} \text{^{\text{H}}\text{B}\text{d}} \text{ } \begin{bmatrix} \text{verb} \\ \text{^{\text{V}}\text{F}\text{OMR} \\ \text{^{\text{C}}\text{D}} \text{^{\text{L}}} \end{bmatrix} \right\} \oplus \begin{bmatrix} \text{^{\text{L}}\text{D}} \text{^{\text{L}}} \text{^{\text{L}}} \end{bmatrix} \right\} \\ \begin{bmatrix} \text{^{\text{A}}\text{R}\text{d}-\text{s} \text{T} \text{$$

In the input description, the verbal complement is saturated for its complements. The verb may have other complements in addition to the saturated infinitival VP, noted as list 1 in (18a) and 2 in (18b). We distinguish between subject control verbs and subject raising verbs to accommodate the case where the complement verb is subjectless, but with complements that can be attracted. In (19a), the verb *sembra* 'seems' is a raising verb, and the infinitive *piacere* 'to please' is an impersonal verb with no subject, but with a complement, realized by *gli* on the head verb *sembra* (there is another interpretation where *gli* is the complement of *sembra*, which is irrelevant).<sup>10</sup> Note that there is inter-speaker variation: *sembrare* 'to seem' is not a restructuring verb for all Italian speakers (hence % on the examples).

The category of the subject is not specified: it can be an infinitival VP as well as an NP (or even a sentence); in the first case, the index is that of the situation

<sup>10</sup>Alternatively, in a grammar with null pronouns, impersonal and unaccusative verbs in Romance languages could be analyzed as having a null pronoun subject, a representation which allows a common input for subject control and raising verbs in the Argument Attraction Lexical Rule (as in Monachesi 1998: 331).

#### Danièle Godard & Pollet Samvelian

(19c), in the second, it is the index of the nominal entity (19b). Again, the upstairs clitic *gli* corresponds to the argument of *piacere* 'to please':

(Italian)

	- b. % [Questo this regalo] gift gli to.him sembra seems piacere. please 'This gift seems to please him.'
	- c. % [Andare go.away in on vacanza] vacation gli to.him sembra seems piacere please 'To go away on vacation seems to please him.'

# **4.2 The different structures of complex predicates with restructuring verbs**

The point of this section is to show that argument attraction is compatible with different structures: complex predicate formation and structure are two different aspects of the grammar. In Romance languages, restructuring verbs can take a VP complement, or be the head of a complex predicate. In the latter case, there are two possible structures: the restructuring verbs enter either a flat structure or a verbal complex. We speak of a flat structure when the complement verb as well as the complements that it subcategorizes for are all sisters of the head. We speak of a verbal complex when the head verb and the complement verb form a constituent by themselves, to the exclusion of their complements (see Figure 4).

We contrast Italian and Spanish.<sup>11</sup> Note that in Spanish, there is variation among speakers: we describe here one usage of Spanish complex predicates.

The impossibility of preposing illustrated in (17) for both languages shows that the sequence of the complement verb and its complements does not form a constituent (a VP) when there is a complex predicate, a point made by Rizzi (1982) for Italian, on the basis of a series of constructions (pied-piping, clefting, Right Node Raising, Complex NP shift). However, the two languages differ with respect

<sup>11</sup>In Portuguese, restructuring verb constructions are also a flat structure, but with different ordering constraints than Italian; the variety of Spanish not described here is similar to Portuguese. Except for the copula (see Section 4.4), complex predicate constructions with head verbs entering only one structure also distribute between these two structures among Romance languages: tense auxiliaries in French, Italian and Portuguese, as well as Romanian modal *a putea* 'can', are the head of a flat structure, while tense auxiliaries in the variety of Spanish described here and in Romanian enter a verbal complex (Abeillé & Godard 2010).

Figure 4: Three constituent structures for Romance restructuring verbs

#### Danièle Godard & Pollet Samvelian

to other properties. In what follows, the fact that there is a complex predicate is indicated by the presence of a clitic on the head verb.

First, adverbs occur between the restructuring verb and the infinitive in Italian (20a), but not in Spanish (20b) (though a few adverbs, such as *casi* 'nearly', *ya* 'already' and *apenas* 'barely' are possible). In Spanish, an adverb may occur after the verb and before the infinitive if the complement is a VP (20c) (examples in (20) from Abeillé & Godard 2010: 139).


c. Juan Juan Juan quiere quiere wants a menudo a menudo often leer*lo*. leer-lo read.it 'Juan wants to read it often.'

Second, an inverted subject NP can occur between the two verbs of a complex predicate in Italian (21a), but not in Spanish (21b). The subject can occur postverbally in interrogative sentences. In Italian, it can occur between the two verbs with a special prosody, indicated by the small capitals in (21a), and with inter-speaker variation (Salvi 1980). In Spanish, this is not possible (except for the pronominal subject; Suñer 1982).

	- 'Is Juan beginning to understand it?'

#### 11 Complex predicates

Finally, Italian heads of complex predicates can have scope over the coordination of infinitives with their complements (22a), while this is not the case in Spanish (22b). Again, the presence of a clitic on the head verb (*lo vuole* lit. it wants, *le volvió* lit. to.him started.again) shows that this is a complex predicate construction (examples from Abeillé & Godard 2010: 136–137).

	- b. \* Le to.him/her volvió started.again a to pedir ask un an autógrafo autograph y and a to hacer make proposiciones. proposals (Spanish) Intended: 'He started again to ask him for an autograph and to make proposals to him/her.'

Constituency tests such as preposing, as in (17), show that the verbal complement is not a VP in either language. The verbal complex, in which the two verbs form a constituent without the complements, is well-suited to account for the absence of adverbs and of subject NPs, if such combinations exclude elements other than verbs (adverbs in particular). This constraint can be captured by the feature [LIGHT+], which has been used in Romance languages for other phenomena as well (Abeillé & Godard 2000; see Section 4.3).<sup>12</sup> Hence, complex predicate constructions in Spanish contain a verbal complex, while they form a flat structure in Italian containing the complement verb and its complements.

This is illustrated with examples in Figure 4, which all mean 'Marco wants to give it to Maria'. The verb takes a VP complement in Figure 4a in both languages, it is the head of a flat VP in Italian in Figure 4b, and it enters a verbal V-V complex in Spanish in Figure 4c (from Abeillé & Godard 2010: 146).

The possibility of the coordination in (22a) has been viewed as an argument in favor of a complement VP, even when there is argument attraction (Andrews & Manning 1999). The data go against such an analysis for Spanish, since the coordination is not acceptable. For Italian, although such sequences as (22a) can be analyzed as instances of coordinations of VP, they can also be instances of Non-Constituent Coordinations (NCCs; an English example would be *John gives*

<sup>12</sup>The adverbs admissible in the Spanish verbal complex are light.

#### Danièle Godard & Pollet Samvelian

*a book to Maria and discs to her brother*; see Abeillé & Chaves 2021: Section 7, Chapter 16 of this volume). So, the question becomes: why is (22b) not an acceptable NCC in Spanish? Abeillé & Godard (2010) propose that NCCs are subject to a general constraint in Romance languages: the parallel elements of the coordination must be at the same syntactic level, otherwise the acceptability is degraded. An example is the contrast between (23a) and (23b) in Spanish. The structure of (22b), repeated in (23c), is similar to that of (23b), if it is a verbal complex ((23) from Abeillé & Godard 2010: 137, 144).

(23) a. Juan Juan da gives [el the libro book de of Proust] Proust [a to María] María y and [el the (libro) book de of Camus] Camus [a to Pablo]. Pablo (Spanish) 'Juan gives the book by Proust to María and the book by Camus to Pablo.'

b. ?? Juan Juan da gives [el the libro book de of Proust] Proust [a to María] María y and [de of Camus] Camus [a to Pablo]. Pablo

Intended: 'Juan gives the book by Proust to María and the book by Camus to Pablo.'

c. \* [Le to.him/her volvió started.again a to pedir] ask [un an autógrafo] autograph y and [a to hacer] make [proposiciones]. proposals Intended: 'He started again to ask him an autograph and to make proposals to him/her.'

In (23a), the NP *el de Camus* 'the one by Camus' is parallel to and at the same level as *el libro de Proust* 'the book by Proust', the PP *a Pablo* 'to Pablo' is parallel to and at the same level as *a María* 'to María', and the NP and the PP are both complements of *da* 'gives'. But, in (23b), *de Camus* 'by Camus' is parallel to *de Proust* 'by Proust', and not at the same level as *el libro de Proust* or as *a Pablo*: *a Pablo* corresponds to the complement of *da* 'gives' while *de Camus* corresponds to the complement of the noun *libro* 'book'. Thus, the acceptability is degraded.

If the structure of a complex predicate is that of a verbal complex in Spanish, the structure of (23c) is similar to that of (23b): *a hacer* corresponds to a *a pedir*, which is the complement V of *volvió* in a V-V constituent, and is not at the same

11 Complex predicates

level as *proposiciones*, which corresponds to *un autógrafo*, which is outside the V-V constituent.

# **4.3 Analysis of Romance restructuring verb constructions in HPSG**

It has been shown in Section 4.1 that the different Romance languages all have complex predicate constructions, and, in Section 4.2, that, although they share some properties (such as clitic climbing and occurrence in other bounded dependencies), they also show syntactic differences amongst themselves (separability of the head and the infinitive or participle in Italian, but not in Spanish, and the possibility of coordination of the complement verb with its complements in Italian, but not in Spanish). The flexibility of HPSG grammars allows us to describe both the commonalities and the differences. The common behavior follows from the fact that they share the mechanism of argument attraction, which characterizes certain classes of verbs; the differences follow from a different phrase structure: the restructuring verb enters a flat structure in Italian (Figure 4b), while it enters a verbal complex in Spanish (Figure 4c). This analysis contrasts with that of Andrews & Manning (1999) in LFG, who propose that complex predicates in Romance languages arise when two verbs have a common domain of grammatical functions, but correspond to just one phrase structure, all these verbs taking a VP complement. It is not clear how they can account for the differences between the two languages.

Two ID schemata combining a head with its complements account for the distinction between the flat structure and the verbal complex: the usual headcomplements phrase, and a different one, the head-cluster phrase, which is also used in German (see Section 5.1.2).

> 

The *head-complements-phrase* is defined as follows:

(24) *head-complements-phrase* ⇒ SYNSEM LOC|CAT|COMPS 1 LIGHT − HEAD-DTR|SYNSEM|LOC|CAT|COMPS 2 1 NON-HEAD-DTRS synsem2sign 2 *ne-list*

The COMPS list is a list of *synsem* objects. It is converted into a list of signs by the relational constraint synsem2sign (see Ginzburg & Sag 2000: 34 for a similar proposal using synsem2sign). *ne-list* stands for non-empty list and this specification ensures that there is at least one element in the list of non-head daughters. The phrase structure described in (24) is general: it allows for a flat structure as well

as binary structures, as in German (see Section 5.1.2). The difference between the two is that, in flat structures, the head daughter is specified as [LIGHT+], which is not the case in binary structures.

Figure 5: Flat VP structure with an Italian restructuring verb

In Romance languages, the *head-complements-phrase* is usually saturated for the expected complements, but not always: list 1 in(24) is usually empty, but does not have to be (see the case of the copula in Section 4.4). An example of the flat structure with a restructuring verb is given in Figure 5.

In the flat structure, the head verb takes as complements the infinitival verb and the canonical complements expected by the infinitive, and combines with them. The VP, corresponding to the *head-complements-phrase*, is complement saturated. The presence of the LIGHT feature (Bonami & Webelhuth 2012) renames the WEIGHT feature proposed in Abeillé & Godard (2000), as well as the LEX feature used in German (e.g. Hinrichs & Nakazawa 1989; 1994; Kiss 1995; Meurers 2000; Müller 2002; Höhle 2019). The LIGHT feature has ordering as well as structural consequences (Abeillé & Godard 2000; 2010). It is appropriate both for words and phrases. Words can be light or non-light; lexical verbs (finite verbs, participles or infinitives without complements) are light. Most phrases are nonlight; in particular, the VP, that is, the phrase which combines with the subject in Romance languages, is non-light.<sup>13</sup> But some phrases can be light if they are composed of light constituents. Such is the case for the *head-cluster-phrase*.

<sup>13</sup>Note that the head-only phrase is non-light. Hence, the VP which dominates a lexical verb only is non-light.

#### 11 Complex predicates

The verbal complex corresponds to another kind of *head-complements-phrase*, called the *head-cluster-phrase*, given in (25) (see Müller 2002: 87; Müller 2021b: 39).<sup>14</sup>

This differs from the usual *head-complements-phrase* on two accounts: there is only one daughter, and both constituents are [LIGHT+].

The *head-cluster-phrase* is illustrated in Figure 6: the phrase *quiere dar* corresponds to the *head-cluster-phrase* in (25), while the whole VP (*quiere dar aquel libro a María* 'wants to give that book to María') corresponds to the usual *headcomplements-phrase* in (24).

Regarding the canonical complements in the verbal complex construction, the requirement is passed up by the verbal complex, according to the description in (25) (the list 1 is non-empty). The verbal complex itself combines with the canonical complements expected by the infinitive (here, 3 and 4 ).

More has to be said regarding the clitic *lo* in the Italian sentence *Marco lo-vuole dare a Maria* 'wants to give it to Maria' and Spanish sentence *Marco lo-quiere dar a María* 'wants to give it to María') in Figure 4. The infinitive is a basic verb: there is no difference between the complements and the arguments (except for the subject); its complement list contains an affixal element (see Section 3). Following the rule in (18a), this element is attracted to the argument list of the head verb, but it is not realized as a complement; the head verb is then a reduced verb (see Figure 7), which is the target of a morphological rule of cliticization, hence the clitic *lo* 'it' on the head verb *vuole* or *quiere* 'wants'.

It remains to ensure that Spanish restructuring verbs are characterized by a verbal complex, and Italian ones by a flat structure. In fact, nothing more has to

<sup>14</sup>This rule is also used in Romanian. As in German, we do not specify the category of the complement (which can be a noun in Spanish, for instance). Note that Müller does not specify the LIGHT value of the non-head daughter (see (40)). This is not necessary since the auxiliaries select for the non-head daughter and hence they can determine the LIGHT value. This is important since some auxiliaries do not require their arguments to be lexical. For example in so called auxiliary flip constructions, the verbal complex may contain non-verbal material. See Hinrichs & Nakazawa (1994: Section 1.4). The LIGHT value of the head daughter and the LIGHT value of the mother is not specified either in grammars of German.

Danièle Godard & Pollet Samvelian

Figure 6: VP with a verbal complex with a Spanish restructuring verb

be said for Italian, since this language lacks the *head-cluster-phrase*. We assume an additional constraint on phrases in Spanish. According to (26), if the phrase is light, it follows that the non-head daughters are also light, and, conversely, if the phrase is non-light, the non-head daughters are non-light.

(26) *phrase* ⇒ LIGHT 1 NON-HEAD-DTRS *list*( -

(used in Spanish)

The structure of the flat VP does not obey this constraint: the infinitival verb which is a non-head daughter is light, while the other complements are nonlight (see Figure 5). When constraint (26) applies, the head of a restructuring verb cannot enter a flat structure.

 ) 

LIGHT 1

Romance languages follow the general constraints on ordering in non-headfinal languages. According to constraint (27), the verb precedes the complements it subcategorizes for. This is relevant not only for the head of the complex predicate, but also for the participle complement of the tense auxiliary or the infinitive

Figure 7: Clitic climbing with Italian and Spanish restructuring verbs

Danièle Godard & Pollet Samvelian

complement of a restructuring verb. Although the latter do not combine with their expected complements, they still subcategorize for them.<sup>15</sup>

(27) V[COMPS h …, 1 , …i ] < [SYNSEM 1 ] (head-initial languages)

# **4.4 The complements of the copula in Romance languages**

It is an interesting fact that, while Romance restructuring verbs enter two different structures (the flat structure and the verbal complex), the copula has the same complement structure across Romance languages (Abeillé & Godard 2001b; 2010).<sup>16</sup> Moreover, this complementation differs both from the flat structure and the verbal complex: the copula takes a non-light complement, which can be saturated or not.

The complement of the copula is underspecified: it is predicative (encoded by [PRD +]), but it can be an adjective, a noun, a preposition or a passive participle (for the passive construction, see Abeillé & Godard 2002). We illustrate clitic climbing with the same example in different Romance languages (examples from Abeillé & Godard 2010: 120).


<sup>15</sup>For more on the definition of such constraints, see Müller (2021a: Section 2), Chapter 10 of this volume.

<sup>16</sup>We concentrate on the predicative use of the copula.

11 Complex predicates

(Romanian)

f. Ion Ion îi to.him/her era was credincios. faithful 'Ion was faithful to him/her.'

The properties of the construction differentiate it clearly from tense auxiliaries and restructuring verbs. For the sake of simplicity, we restrict the examples to French, Italian and Spanish. The sequence of the head of the complement with its complements is a constituent, since, for instance, it can be dislocated and pronominalized (29) (examples in (29) and (30) from Abeillé & Godard 2010: 133- 134).

(29) [Context] Is John faithful to his friends?


'Faithful to his friends, he is, more than to his political ideas.'

c. Fiel faithful a to sus his amigos, friends lo it es is más more que than a to sus his convicciones convictions políticas. political (Spanish)

'Faithful to his friends, he is, more than to his political ideas.'

Crucially, the construction differs from that of restructuring verbs in that the dislocated constituent can leave behind its complements (30).

(30) a. Fidèle, faithful il he l'est it is plus more à to ses his amis friends qu'à than.to ses his convictions convictions politiques. political (French) 'As for being faithful, he is to his friends more than to his political convictions.'

#### Danièle Godard & Pollet Samvelian

b. Fedele, faithful lo it è is ai to.the sui his amici friends più more che than alle to.the sue his idee ideas politiche. political (Italian) 'As for being faithful, he is to his friends more than to his political convictions.' c. Fiel, faithful lo it es is más more a to sus his amigos friends que than a to sus his convicciones convictions políticas. political (Spanish) 'As for being faithful, he is to his friends more than to his political convictions.'

Similarly, the predicative complement can be extracted with its complements or it can leave them behind. Even if the complements are left behind, the predicate complement can be cliticized, as shown in (31c) (compare with examples (16) and (17) with restructuring verbs). In (31), the adjective is extracted (it corresponds to the predicative complement of *être* 'to be') as part of a concessive adjunct (examples (31) and (32) from Abeillé & Godard 2010: 146, 148).

(31) [Context] Is he really faithful to his friends?


'As faithful as he is to his friends, he does not lose sight of his interests.'

c. Aussi as fidèle faithful qu'il as he leur to.them soit, is il he ne NE perd lose pas not de of vue sight ses his intérêts.

interests

'As faithful to them as he is, he does not lose sight of his interests.'

#### 11 Complex predicates

Moreover, an adverb may intervene between the copula and the adjective, not only in French or Italian, where it is expected (it is possible with tense auxiliaries and restructuring verbs), but also in Spanish, where it is not expected, if the structure is the same as with restructuring verbs. We illustrate this possibility with cliticization, in order to make the contrast with restructuring verbs clearer.


The data show that, contrary to restructuring verbs, the copula in Romance languages has only one complement structure. Abeillé & Godard (2002; 2010) propose that the copula takes a "phrasal" complement, which can be saturated or not. This analysis is implemented by saying that the predicative complement is underspecified with respect to complement saturation or attraction, and that it is non-light in all cases. If the predicative complement is a lexical item, a unary branching phrase makes it [LIGHT–] (see Figure 9).

(33) Description of the copula in Romance languages:


Like tense auxiliaries, the copula is a subject raising verb, hence the identical value 1 for its subject and that of its predicative complement, which allows it to be empty. Its complement differs from that of a tense auxiliary (8b) on several accounts: it is predicative, which is not the case for tense auxiliaries, and it is nonlight; in addition, it is not specified for its category.<sup>17</sup> Being non-light, it may have

<sup>17</sup>The predicative complement in French can be a PP (*Il est contre cette décision.* 'He is against this decision.'). However, the complement of a preposition cannot be attracted or extracted, in a general way. Thus, a preposition alone can be a predicative complement only when its complement is unexpressed and interpreted anaphorically (*Il est contre.* 'He is against (it).').

#### Danièle Godard & Pollet Samvelian

combined with its complements or some of them, while the complement of the auxiliary is light, hence all its complements are attracted (see Figures 8, 9).<sup>18</sup>

Figure 8: The Romance copula with a saturated complement

Figure 9 illustrates a case where the affix complement of the adjective is attracted to the copula. For cliticization and the notion of reduced verb, see Section 3.

Regarding the point made in Section 4, that argument attraction is compatible with different structures (a flat structure or a verbal complex), what the Romance copula shows is that still another structure is possible: the copula can inherit arguments from a phrasal complement.

# **5 Complex predicates and word order**

In certain languages, a complex verb construction signals itself essentially by properties of word order. This is the case for instance in German (Hinrichs & Nakazawa 1989; 1994; Kiss 1994; 1995; Hinrichs & Nakazawa 1998; Kathol 1998; Hinrichs & Nakazawa 1999; Kathol 2000; Meurers 2000; Meurers 2002; De Kuthy & Meurers 2001; Müller 2002; 2003; 2012) and Dutch (Rentier 1994; Bouma & van Noord 1998), as well as Korean (Sells 1991; Chung 1998; Yoo 2003; Kim 2016). We concentrate on coherent constructions in German, and on Korean auxiliaries.

# **5.1 Verbal complexes in German**

The contrast in German between coherent and incoherent constructions is reinterpreted in terms of complex predicate formation: coherent constructions con-

<sup>18</sup>Note that the complements included in a predicative PP are not attracted by the copula. This is assured by a constraint on prepositions saying that ARG-ST elements are of type *canonical*.

#### 11 Complex predicates

Figure 9: Clitic climbing with the Romance copula

stitute a complex predicate, as does the copula with predicative adjectives. In coherent constructions, the two predicates cannot be separated and form a predicate complex.

### **5.1.1 Coherent and incoherent constructions in German**

Among verbs with an infinitival complement, German distinguishes between coherent and incoherent constructions (Bech 1955). We speak of constructions rather than verbs, because, although the constructions are triggered by lexical properties of verbs, many verbs can be constructed either way. Verbs entering coherent constructions, obligatorily or optionally, belong to different classes: they may be tense auxiliaries (where the verbal complement is an infinitive or a participle), modals, subject and object raising verbs, subject and object control verbs, copulas, predicative verbs, verbs entering resultative constructions, or particle verbs (see Müller 2002: Chapters 2, 5 and 6).

Coherent and incoherent constructions differ with respect to several properties (separability of the head verb and the infinitive, extraposition of the infinitive with its complements, pied-piping in relative clauses and scope of adjuncts). In incoherent constructions, an adverb such as *nicht* 'not' may occur between the two verbs as in (34a) (from Müller 2002: 42), the infinitival phrase can be extra-

#### Danièle Godard & Pollet Samvelian

posed (compare (34b) and (34c)), and the infinitive may be pied-piped with its relative pronoun complement as in (34d) (examples from Hinrichs & Nakazawa 1998: 117–118).

	- b. … dass that Peter Peter Maria Maria das the Auto car zu to kaufen buy überredet persuades 'that Peter persuades Maria to buy the car'
	- c. … dass that Peter Peter Maria Maria überredet, persuades [das the Auto car zu to kaufen] buy 'that Peter persuades Maria to buy the car'
	- d. Das that ist is das the Auto, car [das which zu to kaufen] buy er he Peter Peter überreden persuade wird will 'That is the car, which he will persuade Peter to buy.'

On the other hand, coherent constructions, of which the combination of the future auxiliary *wird* 'will' in (35a) or the raising verb *scheinen* 'to seem' with an infinitival complement in (35d) are typical examples, do not allow for a nonverbal element between the two verbs, as shown in (35b), nor for extraposition of the infinitive with its complements, as shown in (35c) and (35e) (examples (35a), (35c), (35d) and (35e) from Müller 2002: 43), nor for pied-piping of the infinitive in relative clauses (35f) and (35g) (examples adapted from Hinrichs & Nakazawa 1999: 66).<sup>19</sup>

	- b. \* … dass that Karl Karl das the Buch book lesen read nicht not wird will
		- Intended: 'that Karl will not read the book'
	- c. \* … dass that Karl Karl wird will das the Buch book lesen read Intended: 'that Karl will read the book'

<sup>19</sup>The head verb in coherent constructions is italicized.

#### 11 Complex predicates


Scrambling of the complements of the two verbs, or of the subject of the head verb with the complements of the infinitival, is possible in a coherent construction. In (36a) the complements of *sehen* 'see' (*Peter*) and of *kaufen* 'buy' (*das Auto* 'the car') are not interleaved. In (36b), *Peter*, the complement of *sehen*, occurs between *das Auto*, which is the complement of *kaufen*, and *kaufen* (example (36b) from Hinrichs & Nakazawa 1998: 117).

	- b. … dass that er he das the Auto car Peter Peter kaufen buy *sehen* see *wird* will 'that he will see Peter buy the car'

In the complex predicate approach of this chapter, these data point to the following analysis: incoherent constructions involve a saturated VP complement, while coherent constructions do not; rather, they involve a complex predicate, with a verb attracting the complements of its complement. We assume here a verbal complex for the complex predicate. Figure 10a represents example (34b), and Figure 10b represents example (36b).

# **5.1.2 Coherent constructions in HPSG**

One might wonder whether it is possible to analyze the data in terms of word order instead of structure: a verb governing a coherent construction would trigger

Figure 10: Incoherent and coherent constructions in German

#### 11 Complex predicates

a modification of the ordering domain. More precisely, it would induce domain union of the two ordering domains associated with the two verbal projections (see Müller 2021a: Section 6, Chapter 10 of this volume for a discussion of order domains). Usually, the domain in which constituents are ordered is identical with the phrase or the sentence which dominates them. In the linearization approach (Reape 1994), dominance and ordering can be distinguished. In certain circumstances, the domain for ordering is larger than the domain of constituency, so that the elements belonging to different phrases can be reordered and interleaved, a phenomenon called domain union. Domain union could be responsible for the order in (36b): the structure would be the same as in incoherent constructions (see Figure 10a), but the ordering domain would be the whole sentence.

The existence of the remote (or long) passive goes against such an analysis (Hinrichs & Nakazawa 1994: 140–144; Kathol 1998: Section 5.2; Müller 2002: 94, 136–138, 154–157). A complex predicate construction can be passivized in such a way that the subject (in the nominative case) of the passive auxiliary corresponds to the object of the active infinitive complement. An (impersonal) passive construction like (37a) with an infinitival VP containing an accusative object (*den Wagen* 'the car') alternates with a coherent construction such as (37b), with a corresponding nominative (examples (37a) and (37b) from Müller 2002: 137, (37c) and (37d) from Müller 2003: 40).

	- b. … weil because der the Wagen car oft often zu to reparieren repair *versucht* tried *wurde* was 'because many attempts were made to repair the car'
	- c. Karl Karl darf is.allowed nicht not versuchen try zu to schlafen. sleep 'Karl is not allowed to try to sleep.' 'Karl is allowed to not try to sleep.'
	- d. Karl Karl darf is.allowed versuchen, try nicht not zu to schlafen. sleep 'Karl is allowed to try not to sleep.'

In (37a), the infinitival VP is extraposed. In (37b), there is no infinitival VP, as shown by the position of the adverb *oft* 'often', which occurs before *zu reparieren* 'to repair', while modifying *versucht* 'tried'. In a coherence field, an adverb can

#### Danièle Godard & Pollet Samvelian

scope over any of the verbs that belong to it.<sup>20</sup> In (37c), *zu schlafen* 'to sleep' is not part of the coherent construction, because it is extraposed; *nicht* 'not' can have scope over *darf* 'is allowed' or *versuchen* 'to try', not over *schlafen* 'to sleep'. In (37d), *nicht* belongs to the extraposed infinitival; accordingly, it can only scope over that. The fact that *oft* can scope over *versucht* 'tried' in (37b) shows that they belong to the same coherence field. This means that *zu reparieren* 'to repair', *versucht* 'tried' and *wurde* 'was' form a verbal complex, in which the passive auxiliary *wurde* combines with *zu reparieren versucht*. Since the passive participle *versucht* 'tried' attracts the complement of *reparieren* 'to repair', *zu reparieren versucht* behaves like a passivized transitive verb and together with the passive auxiliary a verbal complex results that selects for a subject that corresponds to the accusative object of *zu reparieren*.

German differs from Romance languages in not distinguishing structurally between the subject and the complements of finite verbs (Pollard 1996): the subject of finite verbs is considered as a complement, and is introduced by the same rule. The structure of the sentence is usually represented as having binary branching daughters (see Figure 10). The constraint is as follows (Müller 2021b: 21).<sup>21</sup>

> 


Following constraint (38), the head combines with one complement at a time, noted as 3 . The presentation of the list as composed of three parts, with the relevant one in any position, allows for a free order. The phrase combining a head with a complement is [LIGHT−].<sup>22</sup> The structure of (39) is exemplified in Figure 11 (Müller 2021b: 22).

<sup>20</sup>A coherence field consists of all verbs entering a coherent construction and all arguments and adjuncts depending on the involved verbs.

<sup>21</sup>The description in (38) differs minimally from that in (24). Following (38), the complements are discharged one at a time from the complements list (binary structure), while (24) allows for several complements at the same level as well as a binary structure. Thus (38) is a more constrained version of (24). Similarly, the description needed for representing flat VPs in Romance languages is a subtype of (24), specifying the head daughter as [LIGHT+].

<sup>22</sup>The feature LIGHT is the equivalent of LEX used in German studies, although the properties of light elements may differ depending on the language. It does not belong to LOCAL features in (38), because an extracted constituent may differ from its trace as regards lightness (Müller 1996; 2021b; see Borsley & Crysmann (2021), Chapter 13 of this volume for discussion of extraction).

#### 11 Complex predicates

(39) … weil because das the Buch book jeder everybody kennt knows 'because everybody knows the book'

Figure 11: Clause structure in German

Turning to complex predicates, they form a verbal complex phrase: they cannot be separated by an adverb or an NP, as shown in (35b) and (35c). Given the structure of the German sentence with binary branching, illustrated in Figure 10, this verbal complex only shows up structurally when there is a series of verbs attracting the complements of their complements, as in (36) (see Figure 10b).

The phrase structure constraint allowing complex predicates is as in (40) (Müller 2012; Müller 2021b: 39). It is called *head-cluster-phrase*, rather than *verbalcomplex-phrase*, because it is not specialized for verbal heads (see also (25)).23,24


<sup>23</sup>Following Hinrichs & Nakazawa (1994: 23) and De Kuthy & Meurers (2001: 177), but contrary to Müller (Müller 2005: 23; Müller 2021b), we mention the lightness of the mother.

<sup>24</sup>The description of the *head-cluster-phrase* in (40) is the same as that in (25), only more general, (25) being specified as having a verbal head.

#### Danièle Godard & Pollet Samvelian

We illustrate the analysis with sentence (36b) (… *dass er das Auto Peter kaufen sehen wird* 'that he will see Peter buy the car'), elaborating on Figure 10b. The description of *werden* (the future auxiliary), a subject raising verb and a verb constructing coherently, is as in (41) (from Müller 2021b: 39), and that of *sehen* 'to see', an object raising verb and an obligatorily coherent verb, is as in (42) (adapted from Müller 2002: 102). The subject and other arguments are raised from the embedded verb. The infinitive is analyzed as having the feature [VFORM *bse*], where *bse* stands for *base*. What forces these verbs to be part of a *headcluster-phrase* is that their infinitive complement is [LIGHT+].


As mentioned above, subjects of non-finite verbs are represented under SUBJ. Since the verbs above attract all arguments, the SUBJ value and the COMPS value are concatenated and represented on the ARG-ST list of the governing verb. Hence, these lexical items are parallel to the ones given for the Romance languages (see (8b) and (18)) with the exception that the selected verb is the last argument in German (SVO vs. SOV) and that German always attracts the arguments from the COMPS list rather than from ARG-ST. The reason for attracting arguments from COMPS is so called partial verb phrase fronting (Müller 1996): verbs may be combined with a subset of their complements in fronted position and only the remaining complements are attracted. Since ARG-ST contains the complete list of arguments, attraction has to take COMPS as the source.

Sentence (36b) is represented in Figure 12.

### **5.1.3 The German copula**

The copula in German, with an adjectival argument, is also the head of a complex predicate.<sup>25</sup> The subject of the copula and the complements of the adjectives can be permuted (examples from Müller 2002: 68; see (36) for coherent verbs):

<sup>25</sup>As in Romance languages, the German copula accepts nominal and prepositional predicative complements. However, they are complement saturated.

11 Complex predicates

Figure 12: Coherent construction with verbal complexes in German

	- b. … dass that dem the.DAT Minister minister die the.NOM Sache matter ganz completely klar clear war was 'that the matter was completely clear to the minister'

Adverbs can have different scopings: in (44) (from Müller 2002: 68), *immer* 'always' can modify the modal or the adjective. This follows if there is just one coherence field and both the modal and the copula are the head of a complex predicate (see Section 5.1.2, example (37b) for verbs constructing coherently).

Danièle Godard & Pollet Samvelian

(44) … weil because der the.NOM Mann man ihr her.DAT immer always treu faithful sein be wollte wanted.to 'because the man always wanted to be faithful to her' 'because the man wanted to be faithful to her forever'

Müller (2002) also shows that the copula does not take a saturated AP complement. Contrary to a construction with a verb constructing incoherently, this AP cannot be extraposed, as shown in (45b), or pied piped with a relative pronoun, as shown in (45d) (from Müller 2002: 70; compare with (34c), (34d)).

	- b. \* Karl Karl ist is gewesen been auf on seinen his Sohn son stolz. proud Intended: 'Karl was proud of his son.'
	- c. der the Sohn, son auf on den whom Karl Karl stolz proud gewesen been ist is 'the son of whom Karl was proud'
	- d. \* der the Sohn, son auf on den whom stolz proud Karl Karl gewesen been ist is Intended: 'the son of whom Karl was proud'

In addition, the German copula, like the Romance copula, is a subject raising verb: the semantic properties of the subject depend on the adjective (a human is proud or faithful, and a matter is clear, as shown also by the nominalizations, cf. the man's faithfulness, the clarity of the matter); moreover, the sentence can be subjectless (from Müller 2002: 72):

(46) Am at.the Montag Monday ist is schulfrei. school.free 'There is no school on Monday.'

The description of the German copula, restricted to its predicative use and to its syntactic part, is as follows (Müller 2009: 226):<sup>26</sup>

<sup>26</sup>As Müller (2009: 227) notes, this copula also works for English and other Germanic SVO languages. Since these languages do not have a Head-Cluster Schema, the copula has to be used in the Head-Complement Schema, which requires complements to be saturated, hence 2 is the empty list for English and other Germanic SVO languages.

11 Complex predicates

$$\begin{array}{ll} \text{(47)} & \text{sein (copula):}\\ & \begin{bmatrix} \text{HEAD} & \text{ver} b \\\\ \text{ARG-ST} \begin{bmatrix} \end{bmatrix} \oplus \begin{bmatrix} \text{2} \end{bmatrix} \oplus \begin{Bmatrix} \begin{bmatrix} \text{PRD} + \text{l} \end{bmatrix} \\\\ \text{SUBJ} \begin{bmatrix} \text{\underline{U}} \\ \text{COMPS} \begin{bmatrix} \underline{\text{U}} \\ \text{COMPS} \begin{bmatrix} \underline{\text{U}} \end{bmatrix} \end{bmatrix} \end{array} \end{array} \end{array} \right] = \begin{cases} \begin{bmatrix} \text{2} \end{bmatrix} \begin{bmatrix} \text{2} \end{bmatrix} \end{bmatrix} \\\\ \begin{bmatrix} \text{2} \end{bmatrix} \text{(weather-SAT)} \begin{bmatrix} \text{2} \end{bmatrix} \end{cases}$$

It differs from the Romance copula in not specifying the lightness of its predicative complement. So, while German allows for the formation of a predicate complex (a head–cluster phrase) with predicative adjectives and normal headcomplement structures with predicative NPs and PPs, the Romance copula only allows XP arguments, which can be complement saturated or not.

# **5.2 Argument attraction with Korean auxiliaries**

Like German complex predicates, Korean auxiliary constructions allow the arguments of the auxiliary and its verb complement to be interleaved. Other properties (case marking, passivization) clearly show that the auxiliary forms a complex predicate with its verbal complement. Control verbs also allow for scrambling, but they do not exhibit the same behavior as auxiliaries, and we will not consider them as heads of complex predicates. As in German again, the auxiliary and its verbal complement constitute a verbal complex.

### **5.2.1 Properties of Korean auxiliaries**

Korean resembles German in that a complex predicate is associated with word order properties (see Sells 1991; Chung 1998; Yoo 2003; Kim 2016). We illustrate here the case of auxiliaries.<sup>27</sup>

Korean auxiliaries semantically resemble aspectual or modal verbs rather than tense auxiliaries: they include such verbs as *iss-* 'to be in the process/state of', *chiwu-* 'to do resolutely', *siph-* 'to want', but also the verb of negation *anh-* 'not' (see also Kim 2021: Section 4, Chapter 18 of this volume). They bear the tense marking for the sentence (48a), impose a certain ending to their verbal complement (*-e* in (48a)), and, when they have a use as ordinary verbs (48b), they have an argument structure which is absent in their auxiliary use (examples from Kim 2016: 85–86).

(48) a. Mia-ka Mia-NOM wul-e cry-CONN pely-ess-ta. end.up-PST-DECL 'Mia ended up crying.'

<sup>27</sup>Chung (1998) also considers control verbs to be the head of complex predicates, and Kim's (2016) study, which excludes control verbs, includes serial verbs and light verb constructions.

#### Danièle Godard & Pollet Samvelian

b. Mimi-nun Mimi-TOP congi-lul paper-ACC hyucithong-ey trash.can-LOC pely-ess-ta. throw.away-PST-DECL 'Mimi threw away the paper in the trash can.'

In (48b), the verb has three arguments: agent subject, theme object, and location complement. This argument structure is absent in (48a).

Consider the sentences in (49). There is no evidence of scrambling in (49a): the subject *Maryka* ('Mary' + nominative) starts the sentence, and the complement of the verb *ilkko* 'read' immediately precedes it. However, in (49b), the subject of the head verb *issta* 'be in the process of', namely *Maryka*, occurs between the complement of *ilkko*, namely *ku chaykul* ('the book' + accusative), and the verb *ilkko* itself.

	- b. Ku the chayk-ul book-ACC Mary-ka Mary-NOM ilk-ko read-CONN iss-ta. be.in.the.process.of-DECL 'Mary is in the process of reading the book.'

A priori, these data could be explained in two ways: either the auxiliary always takes a VP complement, and scrambling is due to linearization, in which case the domains of the two verbs are unioned (see Reape 1994 and also Müller 2021a: Section 6, Chapter 10 of this volume); or there is a complex predicate: the complement of the embedded verb (*ku chaykul* 'the book' + accusative) is attracted by the auxiliary verb.

There are several properties which show that auxiliaries attract their verbal complements' arguments. First, the presence of the auxiliary allows for case alternation: the argument of a verb like *mek-* 'to eat' is assigned accusative case, as shown in (50a); however, when the verb is the complement of the auxiliary verb *siph-* 'to want' in (50b), it can be either accusative or nominative (examples (50) from Kim 2016: 87).

	- b. Mimi-ka Mimi-NOM sakwa-lul/ka apple-ACC/NOM mek-ko eat-CONN siph-ess-ta. wish-PST-DECL 'Mimi would like to eat an apple.'

Given that case assignment is a local phenomenon, and a verb does not influence the case of the complement of its complement, this indicates that *sakwa-*

#### 11 Complex predicates

'apple' becomes the complement of the auxiliary (see also Yoo 2003). Moreover, in Korean, a negative polarity item such as *amwukesto* 'anything' is licensed by a clause-mate negated element. (51) provides examples. (51a) and (51b) show that the negative verb *anh-* allows this negative polarity item as the argument of *mek-* 'to eat', the complement of the auxiliary *siph-* 'to want' (examples from Kim 2016: 91). On the other hand, this negative polarity item is not licensed when the negated verb is *seltukha-* 'to persuade', which is not an auxiliary (51c).

	- b. Mimi-nun Mimi-TOP amwukes-to anything-also mek-ko eat-CONN siph-ci wish-CONN anh-ass-ta. not-PST-DECL 'Mimi did not feel like eating anything.'
	- c. \* Mimi-lul Mimi-ACC amwukes-to anything-also mek-tolok eat-CONN seltukha-ci persuade-CONN anh-ass-ta. not-PST-DECL Intended: '(We) did not persuade Mimi to eat anything.'

Finally, the same argument can be levelled against an analysis which appeals to linearization, as above in German (Section 5.1.2): so-called long passivization is possible with certain auxiliaries like *chiwu-* 'to do resolutely', which cannot be accounted for by appeal to linearization and domain union (examples from Chung 1998: 164).<sup>28</sup> (52a) exemplifies the active sentence, and (52b) the passive one. In (52a), *malssengmanhun solul* 'the troublesome cow' is the complement of the complement verb *phal-* 'to sell'. In (52b), *malssengmanhun soka* is the subject of the passivized verb *chiwe ciessta*.

	- sell-CONN chiw-e do.resolutely-CONN ci-ess-ta. PASS-PST-DECL 'The troublesome cow was resolutely sold (by the farmer).'

<sup>28</sup>Such passives are judged unnatural by native speakers, hence the question mark.

#### Danièle Godard & Pollet Samvelian

Since passivization only affects the complement of the verb which is itself passivized, it follows that *malssengmanhun solul* 'troublesome cow' is the complement of the auxiliary in (52a).

The scrambling data with control verbs, as in (53), are very similar to those with auxiliaries (examples from Chung 1998: 189–190). There is no scrambling in (53a): the dative complement of the head verb is followed by the other complement, a VP. However, in (53b), the subject of the head verb (*Maryka* 'Mary' + nominative) occurs between the complement of the complement verb (*ku chaykul* 'the book' + accusative) and the dative complement of the head verb (*Johnhanthey* 'John' + dative).


However, we do not observe case alternation in this case, and control verbs fail to allow the negative polarity item *amwukesto* 'anything' as the complement of the verb complement (Kim 2016: 91).

	- b. \* Mimi-lul Mimi-ACC amwukes-to anything-also mek-tolok eat-CONN selkhuta-ci persuade-CONN anh-ass-ta. not-PST-DECL Intended: 'We did not persuade Mimi to eat anything.'

Accordingly, we follow Kim (2016: 93–94) in not analyzing control verbs as heads of complex predicates. They take VP complements, and scrambling in (53) must be due to a different process (that is, domain union, as in Lee 2001; see Reape 1994).

#### 11 Complex predicates

### **5.2.2 Korean auxiliaries and the verbal complex**

It has been shown in this chapter that different structures could be associated with argument attraction. Korean auxiliaries are the head of a verbal complex (Chung 1998; Kim 2016). The main fact is that nothing can intervene between the two verbs, for instance no parenthetical expression, such as *hayekan* 'anyway', as illustrated in (55) (examples from Chung 1998: 162). This contrasts with control verbs. In (56), the adverb *cengmal* 'really' can occur before the embedded verb, or between the two verbs (example (56) from Kim 2016: 93).

	- b. \* Mary-ka Mary-NOM sakwa-lul apple-ACC mek-ko eat-CONN hayekan anyway iss-ta. be.in.the.process.of-DECL Intended: 'Anyway, Mary is eating an apple.'

In addition, there is evidence that the verb complement of an auxiliary and its complement do not form a constituent. While an NP may occur after the head verb in a so-called afterthought construction (57a), this is not possible for the embedded verb *mek-* with its complement (57b) (from Chung 1998: 162).

	- b. \* Mary-ka Mary-NOM iss-ta, be.in.the.process.of-DECL sakwa-lul apple-ACC mek-ko. eat-CONN Intended: 'Mary is in the process of eating an apple.'

These data point to a verbal complex (see Section 4.2). However, before coming to this conclusion, we must show that the two verbs do not form a compound word. No (1991) (summarized in Chung 1998, Kim 2016) presents arguments to the effect that they combine in the syntax. The main one relies on the use of delimiters. A delimiter (such as *-man* 'only' or *-to* 'also') can combine with the

#### Danièle Godard & Pollet Samvelian

embedded verb (e.g., *mekkoman issta* 'to be only eating'). Delimiters are a syntactic phenomenon, not limited to verbal morphology. Thus, the head auxiliary and the complement verb form a verbal complex.

### **5.2.3 Korean auxiliaries in HPSG**

Given the free word order in Korean (except for the verb), there are two ways of representing the sentence: either there is a flat structure (except for the verbal complex), where all the arguments, subject and complements, are sisters of each other (see, among others, Chung 1998 for Korean), or there is a binary branching structure (see Kim 2016 for Korean). We adopt the flat structure here since the differences between the two approaches are irrelevant for the purpose of this chapter (but see Müller 2021a: Section 3, Chapter 10 of this volume for binary branching).

The general schema for the sentence is given in (58), adapted from Chung (1998: 178).<sup>29</sup>

This schema combines a head with its subject and its complements in one go. Since no LP constraints are formulated, subjects and objects can be scrambled and permutations are accounted for. The SUBJ list and the COMPS list contain *synsem* elements. These lists are appended into one list, which is then converted into a list of signs by the relational constraint synsem2sign. A further constraint – not given in (58) – requires that the non-head daughters must be [LIGHT−].<sup>30</sup>

<sup>29</sup>This is an instance of a more general schema, needed independently for VSO languages, and subject inversion in English (Pollard & Sag 1994: 388; Ginzburg & Sag 2000: 231–232).

<sup>30</sup>See Müller (2005: 23) and Müller (2021b: Section 2.2.4) for an explicit formulation of such a constraint in a grammar of German.

11 Complex predicates

This ensures that arguments of auxiliaries cannot be realized in flat structures licensed by (58) since auxiliaries select for LIGHT+ complements.

The lexical item of the auxiliary *issta* 'be in the process of' in (59) is provided in (60):<sup>31</sup>


 FORM *iss-ta* HEAD *verb* AUX + SUBJ 1 COMPS 2 ⊕ \* V VFORM *ko* SUBJ 1 COMPS 2 LIGHT + + 

Auxiliaries attract both the subject ( 1 ) and the complements of their verbal complement (list 2 ). The subject value is indicated as 1 , rather than h1i, because the subject is not always realized in Korean. To indicate which ending it imposes on its complement, we use the feature VFORM, thus allowing for the selection of the appropriate ending by the auxiliary (Chung 1998, Kim 2016). So, the verb *issta* selects the ending *-ko* for the verbal complement, and *ilkko* 'read', whose VFORM value is *ko*, is appropriate.

The verbal complex is headed by an auxiliary verb, which is [AUX +], while other verbs are [AUX –]. Thus only auxiliaries can enter this structure. The schema for the verbal complex is given in (61). The verbal complex is [LIGHT+] and made up of two verbs, also [LIGHT+] (see Section 4.3).

<sup>31</sup>Kim (2016: 94–95) argues that complex predicate formation in Korean results from a Head-LEX construction that ensures that the COMPS list of the mother is identical to the COMPS list of the verb daughter that is the complement to the auxiliary. For reasons of space and to make a comparison between Korean complex predicate formation and complex predicate formation in Romance, German, and Persian easier, we adopt a lexical analysis of complex predicate formation in Korean, as proposed in Chung (1998).

#### Danièle Godard & Pollet Samvelian

(61) is an instance of the more general description in (25), restricting the availability of the phrase to auxiliaries. The verbal complex schema saturates the last element of the COMPS list of the head daughter. In this way it is parallel to the head-subject-complements phrase. The only difference is that the argument that is combined with the auxiliary is [LIGHT+] as is required by the auxiliary. The SUBJ list is not mentioned in the constraints on *head-cluster-phrase*. That the SUBJ value of the head daughter is identical to the SUBJ value of the mother follows from constraints on more general types that are inherited (Abeillé & Borsley 2021: 23, Chapter 1 of this volume).

The structure of sentence (59) is represented in Figure 13.

The structure of (62), with a series of two auxiliaries, is represented in Figure 14 (adapted from Chung 1998: 171).

(62) Mary-ka Mary-NOM ku the chayk-ul book-ACC ilk-ke read-CONN po-ko try-CONN iss-ta. be.in.the.process.of-DECL 'Mary is in the process of giving the book a trial reading.'

The verb *issta* 'be in the process of' takes as its complement the verbal complex *ilke poko* 'try to read', whose head is *poko* 'try'. The verb *poko*, being an auxiliary like *issta*, takes as its complement the verb *ilke*, attracting its subject and complements, which are transmitted to the verbal complex *ilke poko*; *ilke poko* saturates the verbal complement expected by *issta*, and transmits the subject and complements to the head auxiliary (see (60)).

The head comes last in Korean, except in the afterthought construction exemplified in (57a), which requires an additional mechanism. Constraint (63) mirrors constraint (27) for Romance languages.

(63) - SYNSEM 1 < - COMPS …, <sup>1</sup>, … (head-final languages)

This constraint holds for the verbal complex, in which the head verb follows the complement verb.

11 Complex predicates

Figure 13: Clause structure with a verbal complex in Korean

# **6 Light verb constructions in Persian: Syntax and morphology, syntax and semantics**

Light verb constructions constitute the third guise of complex predicates. They are characterized semantically: the verb and the second predicate constitute together a semantic predicate. For instance, the French expression combining a semantically light verb and a noun *faire une proposition* 'to make a proposal' is close to *proposer* 'to propose'. They have been studied in HPSG for Korean (Ryu 1993; Lee 2001; Choi & Wechsler 2002; Kim 2016). We focus here on Persian light verb constructions, which form a rich class and tend to replace simplex verbs.

# **6.1 What are complex predicates in Persian?**

Persian simplex verbs constitute a small closed class of about 250 members, only around 100 of which are commonly used. Speakers resort to complex predicates, sequences of a light verb and a preverbal element belonging to various categories (adjective, noun, particle, prepositional phrase). Following Bonami & Samvelian (2010) and Samvelian (2012), such sequences are "multi-word expressions", that is, they are made up of several words, which, together, form a lexeme.

Danièle Godard & Pollet Samvelian

Figure 14: Clause structure with verbal complexes in Korean

Several properties show that the elements are independent syntactic units (Karimi-Doostan 1997; Megerdoomian 2002; Samvelian 2012). We concentrate on noun + verb combinations, i.e. complex predicates in which the preverbal elements are nouns. In what follows, we simply refer to these nominal elements in the complex predicates as "nouns". All inflection is prefixed or suffixed on the verb, as is the negation in (64), and never on the noun, i.e. the nominal part of the complex predicate.

(64) *Dast* hand be to gol-hā flower-PL na-zan. NEG-hit 'Don't touch the flowers.'

The two elements can be separated by the future auxiliary, or even by clearly syntactic constituents, like the complement PP in (64). Both the noun and the verb can be coordinated, as shown in (65) and (66) respectively (from Bonami & Samvelian 2010: 3), where the coordinations are indicated by the brackets.

11 Complex predicates


The noun can be extracted, as in the topicalization in (67), where the sign – indicates where the non-extracted noun would have occurred.

(67) *Dast* hand goft=am said=1SG [be to gol-hā flower-PL – na-zan]. NEG-hit 'I told you not to touch the flowers.'

The fact that the noun is linked to a position belonging to a verbal complement (indicated by the brackets) shows that this is extraction, and not simply variation in order. Complex predicates can also be passivized. In this case, the nominal element of the complex predicate (*tohmat* 'slander' in (68a)) becomes the subject of the passive construction (68b), as does the object of a transitive construction (from Samvelian 2012: 251). The nominal part of the complex predicate is italicized in the examples.

	- b. Be to Omid Omid *tohmat* slander zade hit.PST.PTCP šod. become 'Omid was slandered.'

There is evidence that the verb and the nominal element in a complex predicate share one argument structure. In (69a), the verb *dādan* 'give' takes two complements, the noun *āb* 'water' and the PP *be bāqče* 'to garden', while in (69b) the combination of *dādan* and the noun *āb* takes a direct object, which is marked with *=rā*: in (69b), the noun *āb* and the verb *dād* 'gave' form a complex predicate.

(69) a. Maryam Maryam be to bāqče garden *āb* water dād. gave 'Maryam watered the garden.'

#### Danièle Godard & Pollet Samvelian

b. Maryam Maryam bāqče=rā garden=RA *āb* water dād. gave 'Maryam watered the garden.'

Other properties show that the combination of the two elements, here a noun and a verb, behaves like a lexeme (Bonami & Samvelian 2010). Such combinations feed lexeme formation rules: for instance, the suffix *-i* forms adjectives from verbs: *xordan* 'eat' > *xordani* 'edible', and the same is possible with complex predicates, as shown in (70); perfect participles can be converted into adjectives by adding the suffix *-e*, and this also applies to complex predicates, as shown in (71) (see also Section 6.2; from Bonami & Samvelian 2010: 5).


The meaning of the complex predicate is often a specialization of the predictable meaning of the combination: *dast dādan* (lit. 'hand give') means 'shake hands', *čāqu zadan* (lit. 'knife hit') means 'stab', *šāne zadan* (lit. 'comb hit') means 'comb'. Each specialized meaning has to be learned in the same way as that of a lexeme. Analogy often plays a role in the creation of new lexemes, and this is also true of complex predicates. For instance, the family of complex predicates expressing manners of communication goes from *telegrām zadan* 'telegraph', where hitting *(zadan)* is involved, to cases where hitting is not clearly involved: *telefon zadan* 'phone', *imeyl zadan* 'email', *esemes zadan* 'text', etc.

These complex predicates raise two problems: a morpho-syntactic one and a semantic one. To solve them, we rely crucially on the same property of HPSG as in the preceding syntactic cases, that is, the view of heads as sharing information with their expected complements.

# **6.2 Complex predicates and morphological processes**

Although Persian complex predicates are combinations of words, they may feed (some) derivational rules; see Section 6.1, examples (70) and (71). We analyze

11 Complex predicates

here what appears to be a nominalization rule, studied in Müller (2010). <sup>32</sup> More precisely, the combination of the light verb and its predicative complement gives rise to a participle from which a noun or adjective can be derived (depending on the lexeme). What is specially interesting, as pointed by Müller (2010), is that the participle does not always exist independently of the complex predicate.

The suffix -*ande* is added to the stem 1 of the verb, and it may be added to a complex predicate (72).


In the case studied by Müller, *bāz-konande*, the participle corresponding to the light verb construction exists, although the simplex participle does not (73).


Our analysis is as follows: the participle and its predicative complement may form a compound word, and it is to this compound that the suffix -*ande* is added. We adopt the representation of compounds in (74) from (Bonami & Crysmann 2018: 178), here a compound noun, where the elements of the compound are the value of the feature M-DTRS (morphological daughters).<sup>33</sup>

<sup>32</sup>Müller's analysis adopts a slightly different approach to the issues discussed in the section.

 

<sup>33</sup>For more discussion on morphology in HPSG, see Crysmann (2021), Chapter 21 of this volume.

#### Danièle Godard & Pollet Samvelian

Similarly, a compound is formed from the adjective and the verb in the case of *bāz-konande*. The verb *kon* in the complex predicate *bāz kon* 'to open' is described in (75). It expects a subject NP, the agent, and two complements, an adjective and an NP, which is attracted from the adjective. The content of the adjective is included in the content of the verb, as the nucleus of the caused *soa* (state of affairs) 'make something be adj' (see Müller 2010: 642).

$$\begin{array}{c} \text{(75)} \begin{bmatrix} \text{HEAD} & \text{verb} \\ \text{CAT} & \begin{bmatrix} \text{PRD} & + \\ \text{ARG-ST} \left< \text{NP}\_k, \Box \text{NP}\_j, \text{A} \begin{bmatrix} \text{PRD} & + \\ \text{ARG-ST} \left< \Box \text{NP}\_j \right> \\ \text{CONT} & \Box \end{bmatrix} \end{bmatrix} \end{array} \end{array} \end{bmatrix} \begin{bmatrix} \text{PRD} & + \\ \begin{bmatrix} \text{PRD} & + \\ \text{ARG-ST} & \begin{bmatrix} \text{Open-ST} \left< \text{MLP} \, \text{NP}\_j \right> \\ \text{CONT} & \Box \end{bmatrix} \end{bmatrix} \end{bmatrix} \begin{bmatrix} \text{s} \\ \text{ONT} \\ \text{THEME} \, \text{j} \end{bmatrix} \end{bmatrix} \begin{bmatrix} \text{s} \\ \text{ONT} \\ \text{THEME} \, \text{k} \end{bmatrix} \tag{2}$$

The lexeme *bāz-konande* is a noun, based on a compound with two m-daughters to which the suffix -*ande* is appended. These two daughters are similar to what they are in the complex predicate (75): the verbal element is expecting two complements, an adjective and an NP, and the semantics is as in (75). The verb denotes a cause relation taking as argument the adjective content, which is a relation taking the nominal complement as its argument (SS abbreviates SYNSEM). The description of this noun is given in (76).

$$\begin{aligned} \text{(76)} \quad & \begin{bmatrix} \text{lexame} \\ \text{PRON} \\ \text{F18525} \\ \text{SYNSEM} \\ \text{0} \\ \text{0} \end{bmatrix} \begin{bmatrix} \text{carI} & \text{READ} \,\text{a} \,\text{non} \\ \text{ccars} & \text{[dBNP]} \\ \text{cnovr} & \text{IND} \end{bmatrix} \end{aligned} \tag{77}$$

$$\begin{aligned} \text{(77)} \quad & \begin{bmatrix} \text{PHON} \\ \text{I} \\ \text{ss} \,\text{B} \end{bmatrix} \begin{bmatrix} \text{PHON} \\ \text{ccarI} \\ \text{0} \\ \text{0} \end{bmatrix} \text{[Loc} & \begin{bmatrix} \text{H...0} & \text{H...0} \\ \text{ccarI} & \text{[B185-ST \,\{\bigotimes}\,\text{N}\} \\ \text{coN \,\text{a}} & \text{[D180-ST \,\{\bigotimes}\,\text{N}\,\text{]}] \end{bmatrix} \end{bmatrix} \end{aligned} \left[ \begin{bmatrix} \text{PHON} \\ \text{ccar \,[dB]} \\ \text{ccar \,[dB]} \\ \text{sx} \text{[loc]} \\ \text{[conv \,[Nucr} \\ \text{[conv \,[Nucr} \\ \text{[soc \,\text{-}{Rws \,\text{R}}] \,\text{NU} \,\text{a}]] \end{bmatrix} \right] \right] \end{aligned}$$

It is worth noticing that, as indicated in (77), the compound noun takes the NP expected by the verb as its complement (indicated by the brackets in (77)).

(77) [dar-e lid-EZ botri] bottle bāz-konande opener 'a bottle opener'

#### 11 Complex predicates

This noun is accompanied by the appropriate changes: it denotes the causer, the first argument of the verb m-daughter, and the suffix *(-ande)* is appended to the sequence of the two elements. Nothing in this description requires that the simplex participle (*\*kon-ande*) exist independently of the compound. Hence, the intriguing data in (73) are accounted for.

# **6.3 The semantics of light verb constructions**

In complex predicates, the noun is not referential; rather, it participates in the meaning of the verbal combination. However, in general, these nouns may also be used as ordinary referential nouns. We assume that such nouns come in two guises: predicative, noted [PRD+], which occur in complex predicates, and as referential nouns, noted [PRD–].

These complex predicates do not have a homogeneous semantics. The general idea is that the verb serves to turn a noun into a verb (Bonami & Samvelian 2010), but there is a spectrum, going from a (relatively) semantically compositional combination, to idioms whose semantics is not predictable from the components. Complex predicates exploit different schemas, which can be extended to new nouns, describing new situations. We will exemplify certain common cases, drawing on the detailed study of *zadan* 'to hit' in Samvelian (2012). The uses of *zadan* as a light verb are numerous and varied. We will not try to investigate them exhaustively; rather, we indicate different patterns for the combination of this verb with the noun.

The semantics of a complex predicate is often a specialization of that of the simplex verb. For instance, *lagad zadan* (lit. kick hit) means 'kick', and *sili zadan* (lit. slap hit) means 'slap'.

(78) Olāq donkey be to Omid Omid lagad kick zad. hit 'The donkey kicked Omid.'

For cases like (78), the content of the complex predicate can be simply that of the noun, if they are of the same semantic type. In the example, they both denote events (the nucleus is an event-relation).

This is reminiscent of the way Wechsler (1995) represents the import of a PP with a verb like *talk*; the verb content itself is represented as a *soa* with one participant, the talker; the verb can take a number of PP complements (headed by *to*, *about*, …), which add semantic information describing the situation. The result is a description of a *soa* which combines partial descriptions. Similarly here, the combination of the two contents is identical to the content of *lagad* 'kick', as that latter content is more specialized than that of *zadan*. The complement of

#### Danièle Godard & Pollet Samvelian

the complex predicate may be an NP or a PP headed by *be* 'to' (the preposition is optional).

(79) *zadan1-lexeme* CAT HEAD *verb* ARG-ST NP , *be* NP, N[PRD+]:<sup>1</sup> CONT 1 *soa* NUCLEUS *kick-relation* ACTOR *k* UNDERGOER *m* 

Another case where the combination gives more information than the simplex verb is when this verb takes as its predicative complement a noun which can also occur as a referential noun denoting an instrument crucially involved in the situation (Bonami & Samvelian 2010). Examples are, in different domains, *čāqu zadan* (lit. knife hit) 'stab', *telefon zadan* (lit. phone hit) 'phone', *piāno zadan* (lit. piano hit) 'play the piano'. We illustrate here with *šāne zadan* (lit. comb hit) 'comb' (example adapted from Bonami & Samvelian 2010).

> 

(80) Maryam Maryam mu-hā=yaš=rā hair-PL=3SG=RA šāne comb zad. hit 'Maryam combed her hair.'

$$\begin{aligned} \text{(81)} \quad & \begin{bmatrix} \text{zadan2-lexene} \\ & \begin{bmatrix} \text{CAT} & \begin{bmatrix} \text{HEAD} & \text{verb} \\ \text{ARG-ST} & \begin{bmatrix} \text{NP}\_k, \text{NP}\_m, \text{N}[\text{pRD}+]; \text{i} \end{bmatrix} \end{bmatrix} \end{bmatrix} \end{bmatrix} \\ \text{(82)} \quad & \begin{bmatrix} \text{soa} \\ \text{NOT} \begin{bmatrix} \text{ST} & \begin{bmatrix} \text{II} \\ \text{NULL} \end{bmatrix} \end{bmatrix} \end{bmatrix} \end{aligned} \\ \begin{bmatrix} \text{sort} & \begin{bmatrix} \text{I} \\ \text{NULL} \end{bmatrix} \end{bmatrix} \begin{bmatrix} \text{comb-relation} \\ \text{AUCOR} \end{bmatrix} \\ \text{[BACKGROUND \\_\{\text{involves}\{\box{l}\}, \exists \times \ \text{[}\text{comb(\times)} \wedge \text{use(\box{l}\}, k, \times] \ \} \end{bmatrix} \end{aligned}$$

The condition in the background can be read as follows: the situation 1 involves that there exists a comb and that the actor *k* uses it in that situation. Although the complex predicate includes the content of the predicative complement, the meaning of the complex predicate does not reduce to that of its semantically more specialized member, as in the preceding case, but adds a restriction on the background: the existence of an object and the fact that, in the

#### 11 Complex predicates

situation, such an object is used (see Bonami & Samvelian 2010: 10). The complex predicate formation relies on the same semantic process as a denominal verb, derived from an instrumental noun (*to ski*, *to iron*). Further from a compositional or recoverable meaning is the use of *zadan*, or more precisely *xod=rā zadan* (lit. self hit), with a series of nouns denoting illnesses, handicaps or problematic states (like stupidity, ignorance, etc.): it means 'to pretend, feign' the illness or state in question (example (82) from Samvelian 2012: 223).

(82) Maryam Maryam xod=rā self=RA be to divānegi madness zad. hit 'Maryam feigned madness.'

This use of *zadan* may be seen as an extension of its use with nouns denoting some sort of deceit, such as *gul zadan* (lit. deceit hit) 'to deceive': as in (79), the noun imposes its content on the combination, with a metaphorical use of the verb, retaining from the physical violence meaning of *zadan* 'hit' the idea of an action to the detriment of someone. Nevertheless, nothing in the actual combination in (82) indicates deception. Not all nouns for illnesses are acceptable, only those which cannot really be verified in the situation: a state of fatigue, but not a heart attack. We group them as objects of type *internal-problematic-state*. Here the combination of the verb and the noun is standard, in that the noun is a semantic argument of the verb, but the meaning of the verb is unpredictable.

$$\begin{array}{c} \begin{bmatrix} \textit{z} \textit{adan3-texene} \\ \textit{FAT} \\ \textit{CAT} \\ \textit{CAT-ST} \left\langle \text{NP}\_k, \textit{pro}\_k, \textit{PP} \begin{bmatrix} \textit{FAT} \textit{HEAD} & be \\ \textit{CAT} \textit{READ} \\ \textit{convert} \Pi \end{bmatrix} \begin{bmatrix} \textit{FFORM} & be \\ \textit{NULL} & \textit{NULL} \end{bmatrix} \end{bmatrix} \begin{bmatrix} \textit{FAT} \\ \textit{FAD} \\ \textit{ConvT} \Pi \end{bmatrix} \begin{bmatrix} \textit{internal-problem} \\ \textit{ICU-TENSE} \textit{K} \end{bmatrix} \end{array} \right\} \end{array} \tag{8}$$

Note that, contrary to *zadan1-lexeme*, with which *be* 'to' is optional, the *zadan3 lexeme* requires the predicative complement to be in fact a PP, headed by *be*. We assume that the preposition *be* (frequent in the complement of a complex predicate) is contentless and shares syntactic (the [PRD ±] value) and semantic information with its complement, the predicative N ([CONT 1 ]); this is indicated by treating *be* as the value of the feature P(REPOSITION) FORM (Pollard & Sag 1987: Chapter 3).

 

#### Danièle Godard & Pollet Samvelian

Finally, we turn to an idiom: *dast zadan* (lit. hand hit) meaning 'to start'. The combination may mean, in a more recoverable way, 'to touch' with PP complements denoting concrete objects (as in (64)), or 'to applaud' with a PP complement denoting a person (84a) (from Samvelian 2012: 45). However, it means 'to start' with a PP complement denoting an event as in (84b) (from Samvelian 2012: 185).

	- b. Kārgar-ān worker-PL be to e'tesāb strike dast hand zad-and. hit-3PL 'The workers went on strike.'

To represent the idiom, we resort to the feature LID (lexical identifier) which is associated with lexemes in the lexicon, contains semantic information and allows the verb to select a specific form (Sag 2007: 410–411; 2012: 127-133). A noun or a verb can have a literal (*l-rel*) or an idiomatic content (*i-rel*); the verb of the idiom selects the second one. The noun *dast* in the idiomatic complex predicate *dast zadan* corresponds to the *i-dast-relation* and is selected by the idiomatic *zadan*. The preposition *be*, which heads the other complement, is the same as in *zadan-3*: it identifies its content with that of its complement.

The description of *zadan-4*, which occurs in the idiom *dast zadan* 'to start' is given in (85). The predicative noun complement being specified with the LID value *dast*, it is only in combination with the noun *dast* that *zadan* acquires this meaning.

> 


As usual with light verb constructions whose verb has a general meaning, the different instances of *zadan* do not share a common core meaning. Rather, they are organized by similarities, analogies and metaphors, a configuration which Wittgenstein called "family resemblance" (see the famous example of *game* in

#### 11 Complex predicates

Wittgenstein 2001: §66–67). Nevertheless, the reader will have noticed that the four instances of *zadan* discussed above have a lot in common. The respective commonalities are factored out in a multiple inheritance hierarchy. The details cannot be discussed here but see Abeillé & Borsley (2021: Section 4), Chapter 1 of this volume and Davis & Koenig (2021), Chapter 4 of this volume for more on the hierarchical lexicon in HPSG.

# **7 Conclusion**

Following the usual definition of complex predicates in HPSG as a series of (at least) two predicates, of which one is the head attracting the complements of the other, we have studied them in different languages: Romance languages, German, Korean and Persian. These languages illustrate three ways in which argument attraction (or composition) manifests itself: clitic climbing (and, more generally, bounded dependencies); flexible word order, mixing the arguments of the two predicates; and special semantic combinations, which build a lexeme out of the two predicates (particularly from the verb and the noun in light verb constructions).

HPSG is well-equipped to model complex predicates. The feature structure associated with a predicate specifies which complements it is waiting for, and the feature structure associated with a phrase allows it to be non-saturated regarding its complements, a possibility exploited by a number of verbs which are or can be the head of a complex predicate: the phenomenon is lexically driven. Certain verbs have two entries, one which takes a saturated complement, one which is the head of a complex predicate; but a head can itself be flexible, accepting a complement which is saturated, partially saturated or not saturated at all: this is the case of the copula in Romance languages.

Crucially, the mechanism of argument attraction is not tied to a specific syntactic structure; on the contrary, it is compatible with different structures. We have shown that the properties of a verbal complex (where the two predicates form a syntactic unit by themselves) differ from those of a flat structure (where the two predicates form a unit with the complements). The structures can characterize one language as opposed to another one (Spanish restructuring verbs contrast with Italian ones), but they can also be present in the same language (as in Romanian, for instance; see Monachesi 1999).

Similarly, the mechanism of argument attraction does not induce a specific semantic combination: it is compatible with a compositional semantics (as in a verb + adjective combination in Persian, or modal verb + infinitive complement

#### Danièle Godard & Pollet Samvelian

in Romance languages), as well as a variety of senses specific to the combination of the verb with a class of complements. The semantic description of complex predicates in HPSG can exploit different aspects of the formalism. These include the hierarchical organization of the lexicon and the mechanism of conjunction of descriptions (informally referred to as unification, as with combinations specializing the meaning of the verb in Persian); the informational richness of feature structures which include a BACKGROUND feature that a construction can impose restrictions on (as when the noun corresponds to an instrument implied in the action); and a LID feature which allows a particular complex predicate to point to a specific form (for representing idioms).

# **Abbreviations**

EZ Persian suffix Ezafe CONN connective RA Persian suffix *rā*

# **Acknowledgments**

We thank Anne Abeillé, Gabriela Bîlbîe, Olivier Bonami, Caterina Donati, Han Joung-Youn, Jean-Pierre Koenig, Kil Soo Ko, Paola Monachesi, Stefan Müller, Tsuneko Nakazawa, Daniel Rojas Plata, and Stephen Wechsler.

# **References**


#### 11 Complex predicates


#### Danièle Godard & Pollet Samvelian


#### 11 Complex predicates


#### Danièle Godard & Pollet Samvelian


11 Complex predicates


#### Danièle Godard & Pollet Samvelian

235–246. Stanford, CA: CSLI Publications. http://csli- publications.stanford. edu/HPSG/2001/ (10 February, 2021).


11 Complex predicates


#### Danièle Godard & Pollet Samvelian


# **Chapter 12**

# **Control and Raising**

# Anne Abeillé

Université de Paris

The distinction between raising predicates and control predicates has been a hallmark of syntactic theory since the 60s. Unlike transformational analyses, HPSG treats the difference as mainly a semantic one: raising verbs (*seem*, *begin*, *expect*) do not semantically select their subject (or object) nor assign them a semantic role, while control verbs (*want*, *promise*, *persuade*) semantically select all their syntactic arguments. On the syntactic side, raising verbs share their subject (or object) with the unexpressed subject of their non-finite complement, while control verbs only coindex them. The distinction between raising and control lexeme types is also relevant for non-verbal predicates such as adjectives (*likely* vs. *eager*). The analysis of the complement of both control and raising verbs as phrasal, rather than clausal (or small clause), will be supported by creole data. The distinction between subject and first syntactic argument will be discussed together with data from ergative languages, and the HPSG analysis will be extended to cover cases of obligatory control of the expressed subject of some finite clausal complements in certain languages. The raising analysis naturally extends to copular constructions (*become*, *consider*) and most auxiliary verbs.

# **1 The distinction between raising and control predicates**

# **1.1 The main distinction between raising and control verbs**

In a broad sense, *control* refers to a relation of referential dependence between an unexpressed subject (the controlled element) and an expressed or unexpressed constituent (the controller); the referential properties of the controlled element, including possibly the property of having no reference at all, are determined by those of the controller (Bresnan 1982: 372). Verbs taking non-finite complements

Anne Abeillé. 2021. Control and Raising. In Stefan Müller, Anne Abeillé, Robert D. Borsley & Jean- Pierre Koenig (eds.), *Head-Driven Phrase Structure Grammar: The handbook*, 489–535. Berlin: Language Science Press. DOI: 10. 5281/zenodo.5599840

#### Anne Abeillé

usually determine the interpretation of the unexpressed subject of the non-finite verb. With *want*, the subject is understood as the subject of the infinitive, while with *persuade* it is the object, as shown by the reflexives in (1). They are called "control verbs", and *John* is called the "controller" in (1a) while *Mary* is the controller in (1b).

	- b. John persuaded Mary to buy herself / \* himself a coat.

Another type of verb also takes a non-finite complement and identifies its subject (or its object) with the unexpressed subject of the non-finite verb. Since Postal (1974), they have been called "raising verbs". What semantic role the missing subject has, if any, is determined by the lower verb, or if that is a raising verb, an even lower verb. In (2a) the subject of the infinitive (*like*) is understood to be the subject of *seem*, while in (2b) the subject of the non-finite verb (*buy*) is understood to be the object of *expect*. Verbs like *seem* are called "subjectto-subject-raising verbs" (or "subject-raising verbs"), while verbs like *expect* are called "subject-to-object-raising verbs" (or "object-raising verbs").

	- b. John expected Mary to buy herself / \* himself a coat.

Raising and control constructions differ from other constructions in which the missing subject remains vague (3) and which are a case of "arbitrary" or "anaphoric" control (Chomsky 1981: 75–76; Bresnan 1982: 379).<sup>1</sup>

(3) Buying a coat can be expensive.

A number of syntactic and semantic properties distinguish control verbs like *want*, *hope*, *force*, *persuade*, *promise* from raising verbs like *see*, *seem*, *start*, *believe*, *expect* (Rosenbaum 1967; Postal 1974; Bresnan 1982).<sup>2</sup>

The key point is that there is a semantic role associated with the subject of verbs like *want* but not of verbs like *seem* and with the first complement of verbs like *persuade* but not of verbs like *expect*. The consequence is that more or less any NP is possible as subject of *seem* and as the first NP after *expect*. This includes expletive *it* and *there* and non-referential parts of idioms.

<sup>1</sup>Bresnan (1982) proposes a non-transformational analysis and renames "raising" to "functional control" and "control" (obligatory) to "anaphoric control". See also Wechsler & Asudeh (2021: Section 11), Chapter 30 of this volume.

<sup>2</sup>The same distinction is available for verbs taking a gerund-participle complement: *Kim remembered seeing Lee.* (control) vs. *It started raining.* (raising).

#### 12 Control and Raising

Let us first consider expletive subjects: meteorological *it* is selected by predicates such as *rain*. It can be the subject of *start*, *seem*, but not of *hope*, *want*. It can be the object of *expect*, *believe* but not of *force*, *persuade*:

	- b. It seems/started to rain this morning.
	- c. We expect it to rain tomorrow.
	- b. # The sorcerer persuaded it to rain.

The same contrast holds with an idiomatic subject such as *the cat* in the expression *the cat is out of the bag* 'the secret is out'. It can be the subject of *seem* or the object of *expect*, with its idiomatic meaning. If it is the subject of *want* or the object of *persuade*, the idiomatic meaning is lost and only the literal meaning remains.

	- b. The cat seems to be out of the bag.
	- c. We expected the cat to be out of the bag.
	- d. # The cat wants to be out of the bag. (non-idiomatic)
	- e. # We persuaded the cat to be out of the bag. (non-idiomatic)

Let us now look at non-nominal subjects: *be obvious* allows for a sentential subject (7b) and *be a good place to hide* allows for a prepositional subject (8b). They are possible with raising verbs, as in the following:

	- b. [That Kim is a spy] seemed to be obvious.
	- b. Kim expects [under the bed] to be a good place to hide.

But they would not be possible with control verbs:

	- b. # Kim persuaded [under the bed] to be a good place to hide.

In languages such as German, subjectless constructions can be embedded under raising verbs but not under control verbs (Müller 2002: 48); subjectless passive *gearbeitet* 'worked' can thus appear under *scheinen* 'seem' but not under *versuchen* 'try':

### Anne Abeillé

	- b. Dort there schien seemed noch yet gearbeitet worked zu to werden. be 'Work seemed to still be being done there.'
	- c. \* Der the Student student versucht, tries gearbeitet worked zu to werden. be Intended: 'The student tries to get the work done.'

All this shows that the kind of subject (or object) that a raising verb may take depends only on the embedded non-finite verb.

(German)

Let us now look at possible paraphrases: when control and raising sentences have a corresponding sentence with a finite clause complement, they have rather different related sentences. With control verbs, the non-finite complement may often be replaced by a sentential complement (with its own subject), while this is not possible with raising verbs:

	- b. Bill expected Sandy [to come] / \*[that she would come].

With some raising verbs, on the other hand, a sentential complement is possible with an expletive subject (13a) or with no postverbal object (13b).

	- b. Kim expected [that Sandy would come].

This shows that the control verbs can have a subject (or an object) different from the subject of the embedded verb, but that the raising verbs cannot.<sup>3</sup>

# **1.2 More on control verbs**

For control verbs, the choice of the controller is determined by the semantic class of the verb (Pollard & Sag 1994: Chapter 3 and also Jackendoff & Culicover

<sup>3</sup>Another contrast proposed by Jacobson (1990: 444) is that control verbs may allow for a null complement (*She tried.*) or a non-verbal complement (*They wanted a raise.*), while raising verbs may not (*\*She seemed.*). However, some raising verbs may have a null complement (*It just started* (*to rain*).) as well as some auxiliaries (*She doesn't.*) which can be analyzed as raising verbs (see Section 4 below).

12 Control and Raising

2003). Verbs of influence (*permit*, *forbid*) are cases of object control while verbs of commitment (*promise*, *try*) as in (14a) and orientation (*want*, *hate*) as in (14b) display subject control, as shown by the reflexive in the following examples:<sup>4</sup>

	- b. John permitted Mary to buy herself / \* himself a coat.

This classification of control verbs is cross-linguistically widespread (Van Valin & LaPolla 1997), but Romance verbs of mental representation and speech report are an exception in being subject-control without having a commitment or an orientation component.

	- b. Paul Paul pensait thought avoir have compris. understood 'Paul thought he understood.'

It is worth noting that for object-control verbs, the controller may also be the complement of a preposition (Pollard & Sag 1994: 139):

(16) Kim appealed [to Sandy] to cooperate.

Bresnan (1982: 401), who attributes the generalization to Visser, also suggests that object-control verbs may passivize (and become subject-control) while subject-control verbs cannot (with a verbal complement). However, there are counterexamples like (17c) adapted from Hust & Brame (1976: 255), and the generalization does not seem to hold crosslinguistically (see Müller 2002: 129 for counterexamples in German).

	- b. \* Mary was promised to leave (by John).
	- c. Pat was promised to be allowed to leave.

<sup>4</sup>Some verbs may be ambiguous and allow for subject control (*John proposed to come later.*), object control (*John proposed to Mary to wash herself.*), and joint control (*John proposed to Mary to go to the movies together.*). For the joint control case, a cumulative (i+j) index is needed, as is also the case with long distance dependencies; see Chaves & Putnam (2020: Chapter 3) and Borsley & Crysmann (2021: 549), Chapter 13 of this volume:

<sup>(</sup>i) Setting aside illegal poaching for a moment, how many sharks+ do you estimate [[ died naturally] and [ were killed recreationally]]?

### Anne Abeillé

# **1.3 More on raising verbs**

From a cross-linguistic point of view, raising verbs usually belong to other semantic classes than control verbs. The distinction between subject-raising and object-raising also has some semantic basis: verbs marking tense, aspect, modality (*start*, *cease*, *keep*) are subject-raising, while causative and perception verbs (*let*, *see*) are usually object-raising:

	- b. It started to rain.
	- c. John let it appear that he was tired.
	- d. John let Mary buy herself / \* himself a coat.

Transformational analyses posit distinct syntactic structures for raising and control sentences: subject-raising verbs select a sentential complement (and no subject), while subject-control verbs select a subject and a sentential complement (Postal 1974: 33–39; Chomsky 1981: 55–64).<sup>5</sup> With subject-raising verbs, the embedded clause's subject is supposed to move to the position of matrix verb subject, hence the term "raising". Transformational analyses also posit two distinct structures for object-control and object-raising verbs: while object-control verbs select two complements, object-raising verbs only select a sentential complement, and an exceptional case marking (ECM) rule assigns case to the embedded clause's subject. In this approach, both subject- and object-raising verbs have a sentential complement:

	- We expected [<sup>S</sup> John to leave]

However, the putative correspondence between source structure (before movement) and target structure (after movement) for raising verbs is not systematic: *seem* may take a sentential complement (with an expletive subject) as in (13a), but the other subject-raising verbs (aspectual and modal verbs) do not.

	- b. \* It started [that Paul understands].

<sup>5</sup> I disregard here the Movement Theory of Control (Hornstein 1999); see Landau (2000) for criticism.

12 Control and Raising

Similarly, while some object-raising verbs (*expect, see*) may take a sentential complement as in (13b), others do not (*let*, *make*, *prevent*).

	- b. \* We let [that Paul sleeps].

Furthermore, in transformational analyses, it is often assumed that the subject of the non-finite verb must raise to receive case from the matrix verb. But the subject of *seem* or *start* need not bear case, since it can be a non-nominal subject (8b). Data from languages with "quirky" case such as Icelandic also show that subjects of subject-raising verbs in fact keep the quirky case assigned by the embedded verb (Zaenen et al. 1985: 449), in contrast to the subject of subjectcontrol verbs, which are assigned case by the matrix verb and are thus in the nominative. A verb like *need* takes an accusative subject, and a raising verb (*seem*) takes an accusative subject as well when combined with *need* (22b). With a control verb (*hope*), on the other hand, the subject must be nominative (22c).<sup>6</sup>

(22) a. Hana she.ACC vantar lacks peninga. money.ACC 'She lacks money.'

(Icelandic)


Finally, the possibility of an intervening PP between the matrix verb and the non-finite verb should block subject movement, according to Chain formation or Relativized Minimality (Rizzi 1986; 1990).

	- b. Carol seems to herself [e to have been quite fortunate].<sup>7</sup>

Turning now to object-raising verbs, when a finite sentential complement is possible, the structure is not the same as with a non-finite complement. Heavy NP

<sup>6</sup>The examples in (22) are from Sag, Wasow & Bender (2003: 386–387).

<sup>7</sup>McGinnis (2004: 50)

### Anne Abeillé

shift is possible with a non-finite complement, but not with a sentential complement (Bresnan 1982: 423; Pollard & Sag 1994: 113): this shows that *expect* has two complements in (24a) and only one in (24c).

	- b. We expected [to understand] [all those who attended the class].
	- c. We expected [that [all those who attended the class] understand].
	- d. \* We expected [that understand [all those who attended the class]].

This shows that object-raising verbs are better analyzed as two-complement verbs. This analysis predicts that the subject of the non-finite verb has all properties of an object of the matrix verb. It is an accusative in English (*him, her*) (25) and it can passivize, like the object of an object-control verb (26).

	- b. We persuaded him to work on this.
	- b. He was persuaded to work on this.

To conclude, the movement (raising) analysis of subject-raising verbs and the ECM analysis of object-raising verbs are motivated by the idea that an NP which receives a semantic role from a verb should be a syntactic argument of this verb. But they lead to syntactic structures which are not motivated (assuming a systematic availability of a sentential complementation) and/or make wrong empirical predictions (that the postverbal sequence of an ECM verb behaves as one constituent instead of two).

# **1.4 Raising and control non-verbal predicates**

Non-verbal predicates taking a non-finite complement may also fall under the raising/control distinction. Adjectives such as*likely* have raising properties: they neither select the category of their subject nor assign it a semantic role, in contrast to adjectives like *eager*. Meteorological *it* is thus compatible with *likely*, but not with *eager*. In the following examples, the subject of the adjective is the same as the subject of the copula (see Section 3 below).

	- b. John is likely / eager to work here.
	- c. \* It is eager to rain.

#### 12 Control and Raising

The same contrast may be found with nouns taking a non-finite complement. Nouns such as *tendency* have raising properties: they neither select the category of their subject nor assign it a semantic role, in contrast to nouns like *desire*. Meteorological *it* is thus compatible with the former, but not with the latter. In the following examples, the subject of the predicative noun is the same as the subject of the light verb *have*.

	- b. John has a desire to win.
	- c. It has a tendency / \* desire to rain at this time of year.

# **2 An HPSG analysis**

In a nutshell, the HPSG analysis rests on a few leading ideas: non-finite complements are unsaturated VPs (a verb phrase with a non-empty SUBJ list); a syntactic argument need not be assigned a semantic role; control and raising verbs have the same syntactic arguments; raising verbs do not assign a semantic role to the syntactic argument that functions as the subject of their non-finite complement. I continue to use the term *raising*, but it is just a cover term, since no raising move is taking place in HPSG analyses.

In HPSG terminology, raising means full identity of syntactic and semantic information (*synsem*) (Abeillé & Borsley 2021: 18–19, Chapter 1 of this volume) with the unexpressed subject, while control involves identity of semantic indices (discourse referents) between the controller and the unexpressed subject. Coindexing is compatible with the controller and the controlled subject not bearing the same case (22c) or belonging to different parts of speech (16), as is the case for pronouns and antecedents (see Müller 2021a, Chapter 20 of this volume on Binding Theory). This would not be possible with raising verbs, where there is full sharing of syntactic and semantic features between the subject (or the object) of the matrix verb and the (expected) subject of the non-finite verb. In German, the nominal complement of a raising verb like *sehen* 'see' must agree in case with the subject of the infinitive, as shown by the adverbial phrase *einer nach dem anderen* 'one after the other' which agrees in case with the unexpressed subject of the infinitive, but it can have a different case with a control verb like *erlauben* 'allow', as the following examples from Müller (2002: 47–48) show:

(29) a. Der the Wächter watchman sah saw den the Einbrecher burglar.ACC und and seinen his Helfer accomplice.ACC Anne Abeillé

> einen one.ACC / \* einer one.NOM nach after dem the anderen other weglaufen run.away (German) 'The watchman saw the burglar and his accomplice run away, one after the other.' b. Der the Wächter watchman erlaubte allowed den the Einbrechern, burglars.DAT einer one.NOM nach after dem the anderen other wegzulaufen. away.to.run

'The watchman allowed the burglars to run away, one after the other.'

I will first present in more detail the HPSG analysis of raising and control verbs, then provide creole data (from Mauritian) which support a phrasal analysis of their complement, then discuss the implication of control/raising for pro-drop and ergative languages, to end up with a revised HPSG analysis, based on sharing XARG instead of SUBJ.

# **2.1 The HPSG analysis of "raising" verbs**

Subject-raising-verbs (and object-raising verbs) can be defined as subtypes inheriting from *verb-lexeme* and *subject-raising-lexeme* (or *object-raising-lexeme*) types. Figure 1 shows parts of a possible type hierarchy. As in Abeillé & Borsley

Figure 1: A type hierarchy for subject- and object-raising verbs

#### 12 Control and Raising

(2021: Section 4.1), Chapter 1 of this volume, upper case letters are used for the two dimensions of classification, and *verb-lx*, *intr-lx*, *tr-lx*, *subj-rsg-lx*, *obj-rsglx*, *or-v-lx* and *sr-v-lx* abbreviate *verb-lexeme*, *intransitive-lexeme*, *transitive-lexeme*, *subject-raising-lexeme*, *object-raising-lexeme*, *object-raising-verb-lexeme* and *subject-raising-verb-lexeme*, respectively. The figure also shows three examples (*likely*, *seem* and *expect*) inheriting from *sr-a-lx* (for *subject raising adjective lexeme*), *sr-v-lx*, and *or-v-lx*, respectively. The constraints on the types *subj-rsg-lx* and *obj-rsg-lx* are as follows:<sup>8</sup>

(30) a. *subj-rsg-lx* ⇒ - ARG-ST 1 ⊕ …, - SUBJ 1 b. *obj-rsg-lx* ⇒ - ARG-ST NP ⊕ 1 ⊕ -SUBJ 1 

The SUBJ value of the non-finite verb is appended to the beginning of the ARG-ST and, provided 1 contains an element, this means that the subject of the embedded verb is also the subject of the subject-raising verb in (30a). Similarly, if 1 is a singleton list, the subject of the non-finite verb will be the second element of the ARG-ST list of the object-raising verb in (30b).

This means that both subject descriptions share their syntactic and semantic features: they have the same semantic index, but also the same part of speech, the same case, etc. Thus a subject appropriate for the non-finite verb is appropriate as a subject (or an object) of the raising verb: this allows for expletive ((4b), (4c)) or idiomatic ((6b), (6c)) subjects, as well as non-nominal subjects (8b). If the embedded verb is subjectless, as in (10), this information is shared too ( 1 can be the empty list). The dots in (30) account for a possible PP complement as in *Kim seems to Sandy to be smart.*, which we ignore in what follows.

A subject-raising verb (*seem*) and an object-raising verb (*expect*) inherit from *sr-v-lx* and *or-v-lx* respectively, which are subtypes of *subj-rsg-lx* and *obj-rsg-lx* (see Figure 1); their lexical descriptions are as follows, assuming an MRS-inspired semantics (Copestake et al. 2005 and Koenig & Richter 2021: Section 6.1, Chapter 22 of this volume):

<sup>8</sup>⊕ is used for list concatenation. The category of the complement is not specified as a VP because it may be a V in some Romance languages with a flat structure (Abeillé & Godard 2003) and in some verb-final languages where the matrix verb and the non-finite verb form a verbal complex (German, Dutch, Japanese, Persian, Korean; see Müller 2021b, Chapter 10 of this volume on constituent order and Godard & Samvelian 2021, Chapter 11 of this volume on complex predicates). Furthermore, other subtypes of these lexical types will also be used for copular verbs that take non-verbal predicative complements; see Section 3.

### Anne Abeillé


Raising verbs take a VP and not a clausal complement, which means that the embedded infinitive has its complements realized locally (if any) but not its subject. The corresponding simplified trees are as shown in Figures 2 and 3. Notice that the syntactic structures are the same as for control verbs (Figures 5 and 6).

Raising verbs have in common a mismatch between syntactic and semantic arguments: the raising verb has a subject (or an object) which is not one of its semantic arguments (its INDEX does not appear in the CONT feature value of the raising verb). To constrain this type of mismatch, Pollard & Sag (1994: 140) propose the Raising Principle.

(33) Raising Principle: Let X be a non-expletive element subcategorized by Y; X is not assigned any semantic role by Y iff Y also subcategorizes for a complement which has X as its first argument.

This principle was meant to prevent raising verbs from omitting their VP complement, unlike control verbs (Jacobson 1990: 444). Without a non-finite complement, the subject of *seem* is not assigned any semantic role, which violates the Raising principle. However, some unexpressed (null) complements are possible

Figure 2: A sentence with a subject-raising verb

Figure 3: A sentence with an object-raising verb

#### Anne Abeillé

with some subject-raising verbs as well as VP ellipsis with English auxiliaries, which are analyzed as subject-raising verbs (see Section 4 below and Nykiel & Kim 2021: Section 5, Chapter 19 of this volume on predicate/argument ellipsis). So the Raising Principle should be reformulated in terms of argument structure (which includes unexpressed arguments) and not valence features.

	- b. John just started.
	- c. John did.

For subject-raising verbs which allow for a sentential complement as well (with an expletive subject) (13a), another lexical description is needed (see (35a)), and the same holds for object-raising verbs which allow a sentential complement (with no object) ((13b); see (35b)). These can be seen as valence alternations, which are available for some items (or some classes of items) but not all (see Davis, Koenig & Wechsler 2021, Chapter 9 of this volume on argument structure).

(35) a. *seem*: [ARG-ST h NP[*it*], S i] b. *expect*: [ARG-ST h NP, S i]

# **2.2 The HPSG analysis of control verbs**

Sag & Pollard (1991) propose a semantics-based control theory in which the semantic class of the verb determines whether it is subject-control or object-control. They distinguish verbs of orientation (*want*, *hope*), verbs of commitment (*promise*, *try*) and verbs of influence (*persuade*, *forbid*) based on the type of relation and semantic roles of their arguments. Relational types for control predicates can be organized in a type hierarchy like the one given in Figure 4, adapted from Sag & Pollard (1991: 78).<sup>9</sup>

For example, *want*, *promise* and *persuade* have semantic content such as the following, where SOA means state-of-affairs and denotes the content of the nonfinite complement:<sup>10</sup>

<sup>9</sup>For further semantic classification of main predicates in order to account for optional control in languages such as Modern Greek and Modern Standard Arabic, see Greshler et al. (2017).

<sup>10</sup>The fact that SOA has a value of type *relation* follows from the general setup of AVMs that is specified as the so-called signature of the grammar and need not be given here (see Richter 2021: Section 3, Chapter 3 of this volume). I state it nevertheless for reasons of exposition.

#### 12 Control and Raising

Figure 4: A type hierarchy for control predicates

According to this theory, the controller is the experiencer with verbs of orientation, the commitor with verbs of commitment, and the influencer with verbs of influence. From the syntactic point of view, two types of control predicates, *subject-cont-lx* and *object-cont-lx*, can be defined as follows:

(37) a. *subj-contr-lx* - ⇒ ARG-ST NP , …, - SUBJ -IND b. *obj-contr-lx* - ⇒ ARG-ST [], XP , - SUBJ -IND 

The controller is the first argument with subject-control verbs, while it is the second argument with object-control verbs. Contrary to the types defined for raising predicates in (30), the controller here is simply coindexed with the subject of the non-finite complement. Since the controller is referential and since it is coindexed with the controlee, the controlee has to be referential as well. This

### Anne Abeillé

means it must have a semantic role (since it has a referential index), thus expletives and (non referential) idiom parts are not allowed ((5a), (5b), (6d), (6e)). This also implies that its syntactic features may differ from those of the subject of the non-finite complement: it may have a different part of speech (a NP subject can be coindexed with a PP controller) as well as a different case ((16), (22c)).

Verbs of orientation and commitment inherit from the type *subj-contr-lx*, while verbs of influence inherit from the type *obj-contr-lx*. A subject-control verb (*want*) and an object-control verb (*persuade*) inherit from *sc-v-lx* and *oc-v-lx* respectively; their lexical descriptions are as follows:



The corresponding structures for subject-control and object-control sentences are illustrated in Figures 5 and 6.

In some Slavic languages (Russian, Czech, Polish), the predicative adjective must share case with the subject of the copular verb (40a): some subject-control verbs may allow case sharing like subject-raising verbs (40b), unlike object-control verbs (40c). As proposed by Przepiórkowski (2004) and Przepiórkowski &

Figure 5: A sentence with a subject-control verb

Figure 6: A sentence with an object-control verb

### Anne Abeillé

Rosen (2005), coindexing does not prevent full sharing, so the analysis may allow for both (shared) nominative and (default) instrumental case for the unexpressed subject and the predicative adjective, and a specific constraint may be added to enforce only (nominative) case sharing with the relevant set of verbs.<sup>11</sup>


For control verbs which allow for a sentential complement as well ((11a), (12a)), another lexical description of the kind in (41) is needed. These can be seen as valence alternations, which are available for some items (or some classes of items) but not all (see Davis, Koenig & Wechsler 2021, Chapter 9 of this volume on argument structure).

(41) a. *want*: [ARG-ST h NP, S i] b. *promise*: [ARG-ST h NP, NP, S i]

# **2.3 Raising and control verbs in Mauritian**

Mauritian, which is a French-based creole, provides some evidence for a phrasal (and not sentential) analysis of the verbal complement of raising and control verbs. Mauritian raising and control verbs belong roughly to the same semantic classes as in English or French. Verbs marking aspect or modality (*kontign* 'continue', *aret* 'stop') are subject-raising verbs, and causative and perception verbs (*get* 'watch') are object-raising. Raising verbs have different properties from TMA (tense modality aspect) markers: they are preceded by the negation, which follows TMA, and they can be coordinated, unlike TMA (Henri & Laurens 2011: 209):

(42) a. To 2SG pou IRR kontign continue.SF ou or aret stop.SF bwar? drink.LF (Mauritian) 'You will continue or stop drinking?'

<sup>11</sup>The examples in (40) are taken from Przepiórkowski (2004: ex (6)–(7)).

12 Control and Raising

b. \* To'nn 2SG'PRF ou or pou IRR aret stop.SF bwar? drink.LF 'You have or will stop drinking?'

If their verbal complement has no external argument, as is the case with impersonal expressions such as *ena lapli* 'to rain', then the raising verb itself has no external argument, in contrast to a control verb like *sey* 'try':

	- b. \* Sey try ena have.SF lapli. rain Literally: 'It tries to rain.'

Verb morphology in Mauritian provides an argument for the phrasal (and not clausal) status of the complement of both control and raising verbs. Unlike in French, its superstrate, in Mauritian, verbs inflect neither for tense, mood and aspect nor for person, number, and gender. But they have a short form and a long form (henceforth SF and LF), with 30% of verbs showing a syncretic form (as for example *bwar* 'drink'). The following list of examples provides pairs of short and long forms respectively:

	- b. pans/panse 'think', kontign/kontigne 'continue', konn/kone 'know'

As described in Henri (2010: Chapter 4), the verb form is determined by the construction: the short form is required before a non-clausal complement, and the long form appears otherwise.<sup>12</sup>

(45) a. Zan Zan sant sing.SF [sega] sega / manz eat.SF [pom] apple / trov find.SF [so POSS mama] mother / pans think.SF [Paris]. Paris 'Zan sings a sega / eats an apple / finds his mother / thinks about Paris.' b. Zan Zan sante sing.LF / manze. eat.LF

<sup>&#</sup>x27;Zan sings / eats.'

<sup>12</sup>*yer* 'yesterday' is an adjunct. See Hassamal (2017) for an analysis of Mauritian adverbs which treats as complements those that trigger the verb short form.

Anne Abeillé

> c. Zan Zan ti PRF zante sing.LF yer. yesterday 'Zan sang yesterday.'

Henri (2010: 258) proposes to define two possible values (*sf* and *lf* ) for the head feature VFORM, with a lexical constraint on verbs simplified as follows (*nelist* stands for non-empty list):

(46) *v-word* VFORM *sf* ⇒ - COMPS *nelist*

Interestingly, clausal complements do not trigger the verb short form (Henri 2010: 131 analyses them as extraposed). The complementizer (*ki*) is optional.

	- b. Mari Mari trouve find.LF [(ki) that so POSS mama mother tro too.much manze]. eat.LF 'Mari finds that her mother eats too much.'

On the other hand, subject-raising and subject-control verbs occur in a short form before a verbal complement.


The same is true with object-control and object-raising verbs:


#### 12 Control and Raising

Raising and control verbs thus differ from verbs taking sentential complements. Their SF form is predicted if they take unsaturated VP complements. Assuming the same lexical type hierarchy as defined above, verbs like *kontign* 'continue' and *sey* 'try' inherit from *sr-v-lx* and *sc-v-lx* respectively.<sup>13</sup>

# **2.4 Raising and control in pro-drop and ergative languages**

The theory of raising and control presented above naturally extends to pro-drop and ergative languages. But a distinction must be made between subject and first syntactic argument. Since Bouma, Malouf & Sag (2001), it is widely assumed that syntactic arguments are listed in ARG-ST and that only canonical ones are present in the valence lists (SUBJ, SPR and COMPS). See the Argument Realization Principle (ARP) in Abeillé & Borsley (2021: 17), Chapter 1 of this volume. For pro-drop languages, it has been proposed, e.g., in (Manning & Sag 1999: 65), that null subject verbs have a first argument having the non-canonical *synsem* type *pro*, representing the unexpressed subject in the ARG-ST list, but nothing in their SUBJ list.


descriptions for (50b) and (50c) are as follows:

(51) a. *posso* 'can' (*sr-v-lx*): SUBJ hi COMPS 2 ARG-ST 1[*pro*], 2 - SUBJ 1 

<sup>13</sup>Henri & Laurens use Sign-based Construction Grammar (SBCG) (see Abeillé & Borsley 2021: Section 7.2, Chapter 1 of this volume and Müller 2021c: Section 1.3.2, Chapter 32 of this volume), but their analyses can be adapted to the feature geometry of Constructional HPSG (Sag 1997) assumed in this volume. The analysis of control verbs sketched here will be revised in Section 2.5 below.

Anne Abeillé

> b. *voglio* 'want' (*sc-v-lx*): SUBJ hi COMPS 2 ARG-ST NP [*pro*], <sup>2</sup> - SUBJ -IND

 Balinese, an ergative language, provides another example of non-canonical subjects. Wechsler & Arka (1998) argue that the subject is not necessarily the first syntactic argument in this language. A transitive verb has two verb forms, called "voice", and there is rigid SVO order, regardless of the verb's voice form. In the agentive voice (AV), the subject is the ARG-ST initial member, while in the objective voice (OV), the verb is transitive, and the subject is the initial NP, although it is not the first element of the ARG-ST list. (see Davis, Koenig & Wechsler 2021: Section 3.3, Chapter 9 of this volume):

(52) a. Ida 3SG ng-adol AV-sell bawi. pig 'He/She sold a pig.' (Balinese)

b. Bawi pig adol OV.sell ida. 3SG 'He/She sold a pig.'

Different properties argue in favor of a subject status of the first NP in the objective voice. Binding properties show that the agent is always the first element on the ARG-ST list; see Wechsler & Arka (1998), Manning & Sag (1999) and Müller (2021a: Section 5), Chapter 20 of this volume. The objective voice is also different from the passive: the passive may have a passive prefix and an agent *by*-phrase, and it does not constrain the thematic role of its subject. The two verbal types can be defined as follows (see Davis, Koenig & Wechsler 2021: Section 3.3, Chapter 9 of this volume):

```
(53) a. av-verb ⇒

          SUBJ 1
          COMPS 2
          ARG-ST 1 ⊕ 2

      b. ov-verb ⇒

          SUBJ 1
          COMPS 2
          ARG-ST 2 ⊕ 1
```
Together with a constraint stating that the SUBJ list has at most one element, these constraints license the following two verb forms:

12 Control and Raising

(54) a. Lexical description of *ng-adol* 'sell.AV': SUBJ NP COMPS NP ARG-ST NP , NP b. Lexical description of *adol* 'sell.OV': SUBJ NP COMPS NP ARG-ST NP , NP 

In this analysis, the preverbal argument, whether the theme of an OV verb or the agent of an AV verb, is the subject, and as in many languages, only a subject can be raised or controlled (Chomsky 1981; Zaenen et al. 1985). Thus the first argument of the verb is controlled when the embedded verb is in the agentive voice, and the second argument is controlled when the verb is in the objective voice.<sup>14</sup>

(55) a. Tiang 1 edot want [teka]. come 'I want to come.'


Similarly, only the agent can be "raised" when the embedded verb is in the agentive voice, since it is the subject. And only the patient can be "raised" (because that is the subject) when the embedded verb is in the objective voice:<sup>15</sup>

(56) a. Ci 2 ngenah seem sajan much ngengkebang AV.hide kapelihan-ne. mistake-3POSS (Balinese) 'You seem to be hiding his/her wrongdoing.' b. Kapelihan-ne mistake-3POSS ngenah seem sajan much engkebang OV.hide ci. 2 'His/her wrongdoings seem to be hidden by you.'

(Balinese)

<sup>14</sup>The examples in (55) are taken from Wechsler & Arka (1998: ex 25).

<sup>15</sup>The examples in (56) are taken from Wechsler & Arka (1998: 391–392).

#### Anne Abeillé

Turning now to ditransitive verbs, *majanji* 'promise' denotes a commitment relation, so the promiser must have semantic control over the action promised (Farkas 1988; Kroeger 1993: Section 2.4; Sag & Pollard 1991: 78). The promiser should therefore be the agent of the lower verb. This semantic constraint interacts with the syntactic constraint that the controllee must be the subject, predicting that the lower verb must be in agentive voice, with an agentive subject:<sup>16</sup>

	- 'I promised to give Nyoman money.'
	- b. \* Tiang 1 majanji promise Nyoman Nyoman baang OV.give pipis. money
	- c. \* Tiang 1 majanji promise pipis money baang OV.give Nyoman. Nyoman

The same facts obtain for other control verbs such as *paksa* 'force'. Turning now to object-raising verbs like *tawang* 'know', these can occur in the agentive voice with an embedded AV verb (58a) and with an embedded OV verb (58c), unlike control verbs like *majanji* 'promise'. They can also occur in the objective voice when the subject of the embedded verb is raised. In (58b), the embedded verb (*nangkep* 'arrest') is in the agentive voice, and its subject (*polisi* 'police') is raised to the subject of *tawang* 'know' in the objective voice; in (58d), the embedded verb (*tangkep* 'arrest') is in the objective voice, and its subject (*Wayan*) is raised to the subject of *tawang* 'know' in the objective voice (Wechsler & Arka 1998: ex 23).

	- b. Polisi police tawang=a OV.know=3 lakar FUT nangkep AV.arrest Wayan. Wayan
	- c. Ia 3 nawang AV.know Wayan Wayan lakar FUT tangkep OV.arrest polisi. police 'He knew that the police would arrest Wayan.'
	- d. Wayan Wayan tawang=a OV.know=3 lakar FUT tangkep OV.arrest polisi. police

In Balinese, the subject is always the controlled (or "raised") element, but it is not necessarily the first argument of the embedded verb. The semantic difference

<sup>16</sup>The examples in (57) are taken from Wechsler & Arka (1998: 398–399).

#### 12 Control and Raising

between control verbs and raising verbs has a consequence for their complementation: raising verbs (which do not constrain the semantic role of the raised argument) can take verbal complements either in the agentive or objective voice, like subject-control verbs, while object-control verbs (which select an agentive argument) can only take a verbal complement in the agentive voice. This difference is a result of the analysis of raising and control presented above, and nothing else has to be added.

# **2.5 XARG and an alternative HPSG analysis**

Sometimes, obligatory control is also attested for verbal complements with an expressed subject. As noted by Zec (1987), Farkas (1988) and Gerdts & Hukari (2001: 115–116), in some languages, such as Romanian, Japanese (Kuno 1976; Iida 1996) or Persian (Karimi 2008), the expressed subject of a verbal complement may display obligatory control. This may be a challenge for the theory of control presented here, since a clausal complement is a saturated complement with an empty SUBJ list, and the matrix verb cannot access the SUBJ value of the embedded verb. Sag & Pollard (1991: 89) proposed a semantic feature external-argument (EXT-ARG), which makes the index of the subject argument available at the clausal level. Sag (2007: 409) proposed to introduce a Head feature XARG that takes as its value the first syntactic argument of the head verb and is accessible at the clause level.

This is adopted by Henri & Laurens (2011: Section 6) for Mauritian. After some subject-control verbs like *pans* 'think', the embedded verb may have an optional clitic subject which must be coindexed with the matrix subject. It is not a clausal complement since the matrix verb is in the short form (SF) and not in the long form (see (46) above).

(59) Zan Zan pans think.SF (\*ki) that (li ) 3SG vini.<sup>17</sup> come.LF 'Zan thinks about coming.'

Using XARG, Henri & Laurens (2011: 214) propose a description for *pans* 'think' that is simplified in (60). The complement of *pans* must have an XARG coindexed with the subject of *pans*, but its SUBJ list is not constrained: it can be a saturated verbal complement (whose SUBJ value is the empty list) or a VP complement (whose SUBJ value is not the empty list).

(Mauritian)

<sup>17</sup>Henri & Laurens (2011: 202)

### Anne Abeillé

(60) Lexical description of *pans* 'think': SUBJ NP COMPS HEAD *verb* XARG - IND 

See also Sag (2007: 408–409) and Kay & Sag (2009) for the obligatory control of possessive determiners in English expressions such as *keep one's cool*, *lose one's temper*, with an XARG feature on nouns and NPs:

	- b. Mary lost \* his / her temper.

This coindexing can also be extended to some subject-raising verbs such as *look like*, which have been called "copy raising" (Rogers 1974; Hornstein 1999 a.o.): *look like* takes a finite complement with an overt subject, and this pronominal subject must be coindexed with the matrix subject; it is a raising predicate, as shown by the possibility of the expletive *there*:

	- b. There looks like there's going to be a storm.<sup>18</sup>

This bears some similarity with English tag questions: the subject of the tag question must be pronominal and coindexed with that of the matrix clause (see Bender & Flickinger 1999, and this chapter Section 4 on auxiliary verbs):

	- b. It rained yesterday, didn't it?

To account for such cases, the types for subject-raising and subject-control verb lexemes in (30a) and (37a) can thus be revised as follows. Assuming a tripartition of *index* into *referential*, *there* and *it* (Pollard & Sag 1994: 138), the only difference between subject raising and subject control being that the INDEX of the subject of control verbs must be a referential NP:<sup>19</sup>

(64) a. *sr-v-lx* ⇒ - ARG-ST XP , …, - XARG - IND b. *sc-v-lx* ⇒ - ARG-ST NP , …, - XARG - IND 

<sup>18</sup>Sag (2007: 407)

<sup>19</sup>This coindexing follows from the fact that control verbs assign a semantic role to their subject and the subject is coindexed with the subject of the controlled verb. Some authors have independently argued that some verbs have either a control-like or a raising-like behavior depending on the agentivity of their subject; see Perlmutter (1970) for English aspectual verbs (*begin*, *stop*) and Ruwet (1991: 56) for French verbs like *menacer* ('threaten') and *promettre* ('promise').

12 Control and Raising

Note that this approach does not work for those languages allowing subjectless verbs (see example (10)).

# **3 Copular constructions**

Copular verbs can also be considered as "raising" verbs (Chomsky 1981: 106). While attributive adjectives are adjoined to N or NP, predicative adjectives are complements of copular verbs and share their subject with these verbs. Like raising verbs (Section 1.3), copular verbs come in two varieties: subject copular verbs (*be*, *get*, *seem*), and object copular verbs (*consider*, *prove*, *expect*).

Let us review a few properties of copular constructions. The adjective selects for the verb's subject or object: *likely* may select a nominal or a sentential argument, while *expensive* only takes a nominal argument. As a result, *seem* combined with *expensive* only takes a nominal subject, and *consider* combined with the same adjective only takes a nominal object.

	- b. [This trip] / \* [That he comes] seems expensive.
	- b. I consider [this trip] expensive/ \* expensive [that he comes].

A copular verb thus takes any subject (or object) allowed by the predicate: *be* can take a PP subject in English with a proper predicate like 'a good place to hide' (67a), and *werden* takes no subject when combined with a subjectless predicate like *schlecht* 'sick' in German (67b):

	- b. Ihm him.DAT wurde got schlecht.<sup>20</sup> sick 'He got sick.'

(German)

In English, *be* also has the properties of an auxiliary; see Section 3.2.

# **3.1 The problems with a small clause analysis**

To account for the above properties, Transformational Grammar since Stowell (1983) and Chomsky (1986: Chapter 4) has proposed a clausal or *small clause* analysis: the second predicate (the predicative adjective) heads a (small) clause;

<sup>20</sup>Müller (2002: 72)

#### Anne Abeillé

its subject raises to the subject position of the matrix verb (68a) or stays in its embedded position and receives accusative case from the matrix verb via exceptional case marking, ECM, as seen above (68b).

(68) a. [NP e] be [<sup>S</sup> John sick] { [NP John ] is [<sup>S</sup> John sick] b. We consider [<sup>S</sup> John sick].

It is true that the adjective may combine with its subject to form a verbless sentence; this happens in African American Vernacular English (AAVE) (Bender 2001), in French (Laurens 2008), in creole languages (Henri & Abeillé 2007: 134), in Slavic languages (Stassen 1997: 62) and in Semitic languages (see Alotaibi & Borsley 2020: 20–26), among others.

(69) Magnifique beautiful ce this chapeau hat ! (French) 'What a beautiful hat!'

But this does not entail that copular verbs like *be* take a sentential complement. Several arguments can be presented against a (small) clause analysis. The putative sentential source is sometimes attested (70c) but more often ungrammatical:

	- b. \* It gets / becomes that John is sick.
	- c. John considers Lou a friend / that Lou is a friend.
	- d. Paul regards Mary as crazy.
	- e. \* Paul regards that Mary is crazy.

When a clausal complement is possible, its properties differ from those of the putative small clause. Pseudo-clefting shows that *Lou a friend* is not a constituent in (71a). (71a) does not mean exactly the same as (71c). Furthermore, as pointed out by Williams (1983), the embedded predicate can be questioned independently of the first NP, which would be very unusual if it were the head of a small clause (71e).

	- b. \* What we consider is Lou a friend.
	- c. We consider [that Lou is a friend].
	- d. What we consider is [that Lou is a friend].
	- e. What do you consider Lou?

12 Control and Raising

Following Bresnan (1982: 420–423), Pollard & Sag (1994: 113) also show that Heavy-NP shift applies to the putative subject of the small clause, exactly as it applies to the first complement of a two-complement verb:

	- b. We would consider [acceptable] [any candidate who supports the proposed amendment].
	- c. I showed [all the cookies] [to Dana].
	- d. I showed [to Dana] [all the cookies that could be made from betel nuts and molasses].

Indeed, the "subject" of the adjective with object-raising verbs has all the properties of an object: it bears accusative case and it can be the subject of a passive:

	- b. We consider that he / \* him is guilty.
	- c. He was proven guilty (by the jury).

Furthermore, the matrix verb may select the head of the putative small clause, which is not the case with verbs taking a clausal complement, and which would violate the locality of subcategorization (Pollard & Sag 1994: 102; Sag 2007) under a small clause analysis. The verb *expect* takes a predicative adjective but not a preposition or a nominal predicate (74); *get* selects a predicative adjective or a preposition (75), but not a predicative nominal; and *prove* selects a predicative noun or adjective but not a preposition (76).

	- b. I expect that island \*(to be) off the route. (p. 103)
	- c. I expect that island \*(to be) a good vacation spot. (p. 103)

# **3.2 An HPSG analysis of copular verbs**

Copular verbs such as *be* or *consider* are analyzed as subtypes of subject-raising verbs and object-raising verbs respectively and hence, the constraints in (30) apply. They share their subject (or object) with the unexpressed subject of their

### Anne Abeillé

predicative complement. Instead of taking a VP complement, they take a predicative complement (PRD +), which they may select the category of. We can thus define a general type for verbs taking a predicative complement as in (77), and then two subtypes of verbs taking a predicative complement: *s-pred-v-lx* for verbs like *be*, which also inherit from subject-raising verbs, and *o-pred-v-lx* for verbs like *consider*, which also inherit from object-raising verbs.

$$\text{(77)} \quad pred \text{-} l\text{x} \Rightarrow \begin{bmatrix} \text{\_{ARG-ST}} \ \langle \text{ ..., } \begin{bmatrix} \text{\_{PRD} + \vert} \end{bmatrix} \end{bmatrix} \end{cases}$$

A copular verb like *be* or *seem* does not assign any semantic role to its subject, while verbs like *consider* or *expect* do not assign any semantic role to their object. For more details, see Pollard & Sag (1994: Chapter 3), Müller (2002: Section 2.2.7; 2009) and Van Eynde (2015). The lexical descriptions for predicative *seem* and predicative *consider* inherit from the *s-pred-v-lx* type and *o-pred-v-lx* type respectively, and are simplified as shown below.

As in Section 2.1, I ignore here a possible PP complement (*John seems smart to me*). With the assumption that the SUBJ list contains exactly one element in English, the following lexical descriptions result:

$$\begin{array}{rcl} \text{(78)} & \text{Lexical description of } seem \,(\text{s-pred-v-lx}): \\ & \begin{bmatrix} \text{susj} & \langle \text{\tiny\tiny\} \rangle \\\\ \text{COMPS} & \begin{Bmatrix} \text{HEAD } \begin{bmatrix} \text{PRD} + \text{l} \end{bmatrix} \\\\ \text{COMPS} & \begin{Bmatrix} \text{SUB } \langle \text{\tiny\tiny\} \rangle \\\\ \text{CONT } \begin{Bmatrix} \text{IND } \boxed \end{Bmatrix} \end{Bmatrix} \end{array} \end{array} \end{array}$$

$$\begin{array}{rcl} \text{ARG-ST } \begin{Bmatrix} \text{D } \boxed \end{Bmatrix} \\\\ \text{CONT} & \begin{Bmatrix} \text{RLS } \left\langle \begin{bmatrix} \text{see-rel} \\\\ \text{SoA } \boxed \end{Bmatrix} \right\rangle \end{array} \end{bmatrix}$$

(79) Lexical description of *consider* (*o-pred-v-lx*):

 SUBJ <sup>1</sup> NP COMPS \* 2 , 3 HEAD - PRD + SUBJ 2 CONT - IND 4 + ARG-ST 1, 2 , 3 CONT RELS \* *consider-rel* EXP SOA 4 + 

The subject of *seem* is unspecified: it can be any category selected by the predicative complement; the same holds for the first complement of *consider* (see examples in (65) above). *Consider* selects a subject and two complements, but only takes two semantic arguments: one corresponding to its subject, and one

#### 12 Control and Raising

corresponding to its predicative complement. It does not assign a semantic role to its non-predicative complement.

Let us take the example *Paul seems happy*. As a predicative adjective, *happy* has a HEAD feature [PRD +] and its SUBJ feature is not the empty list: it subcategorizes for a nominal subject and assigns a semantic role to it, as shown in (80).

$$\text{(80)}\quad \text{Lexical description of } happy;$$

 PHON *happy* HEAD *adj* PRD + SUBJ NP COMPS hi CONT RELS *happy-rel* EXP 

In the trees in the Figures 7 and 8, the SUBJ feature of *happy* is shared with the SUBJ feature of *seem* and the first element of the COMPS list of *consider*. 21

Pollard & Sag (1994: 133) mention a few verbs taking a predicative complement which can be considered as control verbs. A verb like *feel* selects a nominal subject and assigns a semantic role to it.

(81) John feels tired / at ease.

It inherits from the subject-control-verb type (37); its lexical description is given in (82):

$$\begin{array}{ll} \text{(82)} & \begin{subarray}{l} \text{feel (sc-v-lx):}\\ \begin{bmatrix} \text{SUBJ} & \langle \text{\tiny\tiny\tiny\text{INP}}\_{i} \rangle\\ \text{comps} & \left\langle \text{\tiny\tiny\tiny\} \begin{bmatrix} \text{HEAD} \left[ \text{PRD}+ \right]\\ \text{SUBJ} \left\langle \text{\tiny\tiny\tiny\text{INP}} \right\rangle \end{bmatrix} \right\rangle\\ \text{(} & \text{CON} \end{array} & \begin{Bmatrix} \text{HEAD} \left\langle \text{\tiny\tiny\tiny\text{INP}} \right\rangle\\ \text{CON} \end{Bmatrix} \end{array} \\\\ \begin{array}{ll} \text{ARG-ST} & \langle \text{\tiny\tiny\tiny\tiny\text{L}} \rangle\\ \text{CON} & \left[ \begin{array}{l} \left\langle \text{\tiny\tiny\tiny\text{ELS}} \left\langle \text{\tiny\tiny\tiny\text{ETP}} \right\rangle\\ \text{CON} \end{array} \right\rangle \end{array} \\\\ \begin{array}{l} \text{(} & \text{RELS} \left\langle \text{\tiny\tiny\tiny\text{ETS}} \begin{Bmatrix} \text{[\}} \text{level-rel} \\ \text{[} & \text{XEP} \left\langle \text{\tiny\tiny\text{E}} \right\rangle \end{Bmatrix} \right\rangle \end{array} \end{array}$$

 <sup>21</sup>In what follows, I ignore adjectives taking complements. As noted in Section 1, adjectives may take a non-finite VP complement and fall under a control or raising type: as a subject-raising adjective, *likely* shares the SYNSEM value of its subject with the expected subject of its VP complement; as a subject-control adjective, *eager* coindexes both subjects. Such adjectives thus inherit from *subj-rsg-lexeme* and *subj-control-lexeme*, respectively, as well as from *adjectivelexeme*. In some languages, copular constructions are complex predicates, which means that the copular verb inherits the complements of the adjective as well; see Abeillé & Godard (2001) and Godard & Samvelian (2021: Section 4.4 and 5.1.3), Chapter 11 of this volume.

Figure 7: A sentence with an intransitive copular verb

Figure 8: A sentence with a transitive copular verb

12 Control and Raising

# **3.3 Copular verbs in Mauritian**

As shown by Henri & Laurens (2011), and as was the case for other raising verbs (see Section 2.3), Mauritian data provide a strong argument in favor of a nonclausal analysis. A copular verb takes a short form before a predicative complement and a long form before a clausal one. Despite the lack of inflection on the embedded verb and the possibility of subject pro-drop, clausal complements differ from non-clausal complements by the following properties: they do not trigger the matrix verb short form, they may be introduced by the complementizer *ki* and their subject is a weak pronoun (*mo* 'I', *to* 'you'). On the other hand, a VP or AP complement cannot be introduced by *ki*, and an NP complement must be realized as a strong pronoun (*mwa* 'me', *twa* 'you'). So *malad* 'sick' is an adjectival complement in (83a), (83b) and (83d) and not a small clause and *trouv* 'find' takes two complements in (83b) and (83d) and *trouve* 'find' one clausal complement in (83c). See Section 2.3 above for the alternation between short form (SF) and long form (LF) of verbs.

(83) a. Mari Mari ti PST res remain.SF malad. sick 'Mari remained sick.'

(Henri & Laurens 2011: 198)


Henri & Laurens (2011: 218) conclude that "Complements of raising and control verbs systematically pattern with non-clausal phrases such as NPs or PPs. This kind of evidence is seldom available in world's languages because heads are not usually sensitive to the properties of their complements. The analysis as clause or small clauses is also problematic because of the existence of genuine verbless clauses in Mauritian which pattern with verbal clauses and not with complements of raising and control verbs".

Anne Abeillé

# **4 Auxiliaries as raising verbs**

Following Ross (1969), Gazdar et al. (1982) and Sag et al. (2020), *be*, *do*, *have* and modals (e.g., *can*, *should*) in HPSG are not considered to have a special part of speech (*Aux* or *Infl*) <sup>22</sup> but are verbs with the head feature [AUX +].

English auxiliaries take VP (or XP) complements and neither impose categorial restrictions on their subject nor assign it a semantic role, just like other subjectraising verbs. They are thus compatible with non-referential subjects, such as meteorological *it* and existential *there*. They select the verb form of their nonfinite complements: *have* selects a past participle, *be* a gerund-participle and *can* and *will* a bare form.

	- b. Paul is leaving.
	- c. Paul can leave.
	- d. It will rain.
	- e. There can be a riot.

In this approach, English auxiliaries are subtypes of subject-raising verbs and thus take a VP (or XP) complement and share their subject with the unexpressed subject of the non-finite verb (see Section 2.1).<sup>23</sup> The lexical descriptions for the auxiliaries *will* and *have* are given in (85) and (86).

To account for their NICE (negation, inversion, contraction (*isn't*, *won't*), ellipsis) properties, Kim & Sag (2002) use a binary head feature AUX, so that only [AUX +] verbs may allow for subject inversion (87a), sentential negation (87c), contraction or VP ellipsis (87e). See Müller (2021b: Section 5), Chapter 10 of this volume on subject inversion, Kim (2021: Section 2.3), Chapter 18 of this volume

	- b. I dare not be late.
	- c. # It will not dare rain.

<sup>22</sup>Having Infl as a syntactic category and sentences defined as IP does not account for languages without inflection, nor for verbless sentences; see for example Laurens (2008).

<sup>23</sup>*Be* is an auxiliary and a subject-raising verb with a PRD + complement (see Section 3.2 above) or a gerund-participle VP complement, different from the identity *be* which is not a raising verb (see Van Eynde 2008 and Müller 2009 on predication). A verb like *dare*, shown to be an auxiliary by its postnominal negation, is not a raising verb but a subject-control verb:

#### 12 Control and Raising

$$\begin{array}{l} \text{(85)} \quad \text{Lexical description of } \textit{will} \ (sr\text{-}\nu\text{-}\textit{lx});\\ \begin{bmatrix} \textit{HEAD} & \begin{bmatrix} \textit{AUX} + \end{bmatrix} \\ \textit{SUBJ} & \begin{Bmatrix} \textit{U} \end{Bmatrix} \end{Bmatrix} \\\\ \begin{array}{l} \text{comps} \quad \begin{Bmatrix} \textit{[\exists\kern-1.0pt]} \textit{VP} \begin{Bmatrix} \textit{HEAD} \ \textit{[\forall\kern-1.0pt]} \textit{\\ \textit{SUBJ} \end{Bmatrix} \end{Bmatrix} \\\\ \begin{array}{l} \text{ARG-ST} \ \left\langle \begin{bmatrix} \textit{\color{red}{L}} \ \textit{[\exists\kern-1.0pt]} \end{bmatrix} \end{bmatrix} \\\\ \begin{array}{l} \text{{ARG-ST} \ \left\langle \begin{bmatrix} \textit{IND} \ \textit{s} \\ \textit{RLS} \end{bmatrix} \right\rangle \\\\ \begin{bmatrix} \textit{NON} \\ \textit{RLS} \end{bmatrix} \begin{Bmatrix} \textit{[\textit{f}\mathrel{blue}{w}-rel]} \\ \textit{SoA [\exists\kern-1.0pt]} \end{Bmatrix} \end{array} \end{array}$$

(86) Lexical description of *have* (*sr-v-lx*): HEAD - AUX + SUBJ 1 COMPS \* 2 VP HEAD - VFORM *past-part* SUBJ 1 CONT - IND 3 + ARG-ST 1, 2 CONT IND *s* RELS *perfect-rel* SOA 3 

on negation and Nykiel & Kim (2021: Section 5), Chapter 19 of this volume on post-auxiliary ellipsis.<sup>24</sup>

	- b. \* Keeps Paul working?
	- c. Paul is (probably) not working.
	- d. \* Paul keeps (probably) not working.
	- e. John promised to come and he will.
	- f. \* John promised to come and he seems.

Subject raising verbs such as *seem*, *keep* or *start* are [AUX −].

<sup>24</sup>Copular *be* has the NICE properties (*Is John happy?*); it is an auxiliary verb with a [PRD +] complement. Since *to* allows for VP ellipsis, it is also analyzed as an auxiliary verb: *John promised to work and he started to*. See Gazdar, Pullum & Sag (1982: 600) and Levine (2012).

### Anne Abeillé

Sag et al. (2020) revised this analysis and proposed a new analysis couched in Sign-Based Construction Grammar (Sag 2012; see also Müller 2021c: Section 1.3.2, Chapter 32 of this volume). The descriptions used below were translated into the feature geometry of Constructional HPSG (Sag 1997), which is used in this volume. In their approach, the head feature AUX is both lexical and constructional: the constructions restricted to auxiliaries require their head to be [AUX +], while the constructions available for all verbs are [AUX −]. In this approach, non-auxiliary verbs are lexically specified as [AUX −] and [INV −].

Auxiliary verbs, on the other hand, are unspecified for the feature AUX and are contextually specified, except for unstressed *do*, which is [AUX +] and must occur in constructions restricted to auxiliaries.


# **4.1 Subject inversion and English auxiliaries**

Subject inversion is handled by a subtype of head-subject-complement phrase, which is independently needed for verb initial languages like Welsh (Borsley 1999: 285; Sag et al. 2003: 410).<sup>25</sup> It is a specific (non-binary) construction, of which other constructions such as *polar-interrogative-clause* are subtypes, and whose head must be [INV +].

(89) *initial-aux-ph* ⇒

$$\begin{array}{|l|l|}\hline \begin{bmatrix} \text{SUBJ} & \text{\\_} \\ \text{COMPS} & \text{\\_} \end{bmatrix} \\\\ \begin{bmatrix} \text{HEAD-DTR} & \begin{bmatrix} \text{AUC} \\ \text{UBV} \end{bmatrix} \\\\ \begin{bmatrix} \text{COMP} & \begin{bmatrix} \text{\\_} \end{bmatrix} \end{bmatrix} \\\\ \begin{bmatrix} \text{DTRS} & \begin{bmatrix} \text{\\_} \end{bmatrix} \end{bmatrix} & \begin{bmatrix} \text{\\_} \end{bmatrix} \\\\ \begin{bmatrix} \text{DTRS} & \begin{bmatrix} \text{\\_} \end{bmatrix} \end{bmatrix} & \begin{bmatrix} \text{\\_} \text{YNSEM} \begin{bmatrix} \text{\\_} \end{bmatrix} \end{bmatrix} \end{array}$$

<sup>25</sup>As noted in Abeillé & Borsley (2021: 28), Chapter 1 of this volume, in some HPSG work, e.g., Sag et al. (2003: 409–414), examples like (88b) and (88d) are analyzed as involving an auxiliary verb with two complements and no subject. This approach has no need for an additional phrase type, but it requires an alternative valence description for auxiliary verbs.

#### 12 Control and Raising

Most auxiliaries are lexically unspecified for the feature INV and allow for both constructions (non-inverted and inverted), while the first person *aren't* is obligatorily inverted (lexically marked as [INV +]) and the modal *better* obligatorily non-inverted (lexically marked as [INV −]):

	- b. \* I aren't dreaming.
	- c. We better be careful.
	- d. \* Better we be careful?

As for tag questions (*Paul left, didn't he?* (63a)), they can be defined as special adjuncts, coindexing their subject with that of the sentence they adjoin to, using the XARG feature (see above Section 2.5).

(91) *tag-aux-lx* ⇒

$$\begin{array}{|c|c|c|}
\hline
 & \text{INV} & + \\
\text{HEAD} & \text{TENSE} & \text{T} \\
\text{HEAD} & \text{not} \left(\text{[}\text{]}\right) & \\
 & \begin{bmatrix} \text{XARG} & i \\
\text{MOD} & \begin{bmatrix} \text{XARG} & \box{2} \\
\text{POL} & \box{\text{I}} \end{bmatrix} \\
 & \begin{Bmatrix} \text{CNOT} & \begin{bmatrix} \text{PON} \\
\text{IND} & i \end{bmatrix} \end{Bmatrix} \\
\hline
\text{SUBJ} & \left\langle \text{CONT} \begin{bmatrix} \text{pron} \\
\text{IND} & i \end{bmatrix} \right\rangle \\
 & \text{COMPS} \langle \rangle \\
 & \end{bmatrix} \\
\hline
\end{array}$$

 not is a function that returns '+' for the input '−' and '−' for the input '+'. I use coindexing of TENSE to ensure time concordance between the main verb and the tag auxiliary. PRON denotes a subject with a pronominal content.

# **4.2 English auxiliaries and ellipsis**

While the distinction is not always easy to make between VP ellipsis (*Paul can*) and null complement anaphora (*Paul tried*), Sag et al. observe that certain elliptical constructions are restricted to auxiliaries, for example pseudogapping (see also Nykiel & Kim 2021: Section 2.2, Chapter 19 of this volume and Miller 2014).

	- b. Larry might read the short story, but he won't the play.
	- c. \* Ann seems to buy more bagels than Sue seems cupcakes.

### Anne Abeillé

This could be captured by having the relevant auxiliaries optionally inherit the complements of their verbal complement.<sup>26</sup> An additional lexical description of *will* with complement inheritance could be the following, using the non-canonical *synsem* type *pro* for the unexpressed VP:

(93) Lexical description of elliptical *will* (VPE or pseudogapping):

$$
\begin{bmatrix}
\text{subJ} & \langle \text{\tiny\Box} \rangle \\
\text{\textbullet{\tiny\text{\tiny\text{\tiny\text{\tiny\text{\tiny\text{\tiny\text{\tiny\text{\tiny\text{\tiny\text{\tiny\text{\tiny\text{\tiny\text{\tiny\text{\tiny\text{\tiny\text{\tiny\text{\tiny\text{\tiny\text{\text}}}}}}}}}}}} \\
}} \rangle \\
\text{\text{\text{\text{\tiny\text{\text{\tiny\text{\text{\tiny\text{\text}}}}}}} \langle \text{\underline{\text{\text}}} \rangle \text{ } \text{\text{\textdegree T}} \rangle \text{ } \langle \text{\tiny\text{\text{\text}}} \rangle \\
\text{\textbullet{\text{\textquotedblleft}}} & \langle \text{\underline{\text{\textquotedblleft}}} \rangle \text{ } \text{\textquotedblright} \langle \text{\underline{\text{\textquotedblleft}}} \rangle \text{ } \text{\textquotedblleft} \langle \text{\underline{\text{\textquotedblleft}}} \rangle \text{ } \text{\textquotedblright} \rangle \text{ } \text{\textquotedblleft} \end{bmatrix}
$$

If the list 2 is empty, this entry covers VP ellipsis (*I will*), if it is not empty, it covers pseudogapping (*I will the play*).

As observed by Arnold & Borsley (2008), auxiliaries can be stranded in certain non-restrictive relative clauses such as (94a), whereas no such possibility is open to non-auxiliary verbs (94b) (see also Arnold & Godard 2021: 635, Chapter 14 of this volume):

	- b. \* Kim tried to impress Lee, which Sandy didn't try. (Sag et al. 2020: ex. 54a)

The HPSG analysis sketched here captures a very wide range of facts, and expresses both generalizations (English auxiliaries are subtypes of subject-raising verbs) and lexical idiosyncrasies (copula *be* takes non-verbal complements, first person *aren't* triggers obligatory inversion, etc.).

# **5 Conclusion**

Complements of "raising" and control verbs have been either analyzed as clauses (Chomsky 1981: 55–63) or small clauses (Stowell 1981; 1983) in Transformational

<sup>26</sup>See Kim & Sag (2002) for a comparison of French and English auxilaries and Abeillé & Godard (2002) for a thorough analysis of French auxiliaries as "generalized" raising verbs, inheriting not only the subject but also any complement from the past participle. Such generalized raising was first suggested by Hinrichs & Nakazawa (1989; 1994) for German and has been adopted since in various analyses of verbal complexes in German (Kiss 1995; Meurers 2000; Kathol 2001; Müller 1999; 2002), Dutch (Bouma & van Noord 1998) and Persian (Müller 2010: Section 4). See also Godard & Samvelian (2021), Chapter 11 of this volume.

#### 12 Control and Raising

Grammar and Minimalism. As in LFG (Bresnan 1982), "raising" and control predicates are analyzed as taking non-clausal open complements in HPSG (Pollard & Sag 1994: Chapter 3), with sharing or coindexing of the (unexpressed) subject of the embedded predicate with their own subject (or object). This leads to a more accurate analysis of "object-raising" verbs as two-complement verbs, without the need for an exceptional case marking device. This analysis naturally extends to pro-drop and ergative languages; it also makes correct empirical predictions for languages that mark clausal complementation differently from VP complementation. A rich hierarchy of lexical types enables verbs and adjectives taking non-finite or predicative complements to inherit from a raising type or a control type. The Raising Principle prevents any other kind of non-canonical linking between semantic argument and syntactic argument. A semantics-based control theory predicts which predicates are subject-control and which objectcontrol. The "subject-raising" analysis has been successfully extended to copular and auxiliary verbs, which are subtypes of raising verbs, without the need for an Infl category.

# **Abbreviations**


# **Acknowledgements**

I am grateful to the reviewers, Bob Borsley, Jean-Pierre Koenig and Stefan Müller for their helpful comments.

# **References**

Abeillé, Anne & Robert D. Borsley. 2021. Basic properties and elements. In Stefan Müller, Anne Abeillé, Robert D. Borsley & Jean- Pierre Koenig (eds.), *Head-Driven Phrase Structure Grammar: The handbook* (Empirically Oriented Theoretical Morphology and Syntax), 3–45. Berlin: Language Science Press. DOI: 10.5281/zenodo.5599818.

#### Anne Abeillé


12 Control and Raising


#### Anne Abeillé


12 Control and Raising


#### Anne Abeillé


#### 12 Control and Raising


### Anne Abeillé

*The handbook* (Empirically Oriented Theoretical Morphology and Syntax), 89– 124. Berlin: Language Science Press. DOI: 10.5281/zenodo.5599822.


12 Control and Raising


# **Chapter 13**

# **Unbounded dependencies**

# Robert D. Borsley

University of Essex and Bangor University

# Berthold Crysmann

Centre national de la recherche scientifique (CNRS)

Unbounded dependencies of the kind that are found in *wh*-interrogatives, relative clauses, and other constructions have been a major focus of research in HPSG. They typically involve a gap of some kind and some distinctive higher structure, often involving a filler in a non-argument position with the properties of the gap. HPSG has developed detailed proposals about the bottom of the dependency, the middle, and the top. In the case of the top of the dependency, complex hierarchies of phrase types have been employed to handle the distinctive properties of the various unbounded dependency constructions. Analyses have also been developed for unbounded dependencies with a resumptive pronoun, the special properties of *wh*-interrogatives, extraposition phenomena, and filler-gap mismatches.

# **1 Introduction**

Since Ross (1967) and Chomsky (1977), it has been clear that many languages have a variety of constructions involving an unbounded (or long distance) dependency (henceforth UD). *Wh*-interrogatives and relative clauses are important examples, but, as we will see, there are many others. Typically these constructions contain a gap (in the sense that a dependent is missing) and some distinctive higher structure, and neither can appear without the other. The following illustrate:

	- b. \* You put \_ on the table?
	- c. \* What did you put it on the table?

Robert D. Borsley & Berthold Crysmann. 2021. Unbounded dependencies. In Stefan Müller, Anne Abeillé, Robert D. Borsley & Jean- Pierre Koenig (eds.), *Head-Driven Phrase Structure Grammar: The handbook*, 537–594. Berlin: Language Science Press. DOI: 10.5281/zenodo.5599842

#### Robert D. Borsley & Berthold Crysmann

In (1a) there is a gap (indicated by the underscore) in object position and the distinctive higher structure involves the interrogative pronoun *what* and the presubject auxiliary *did*. (1b), where the gap is present but not the distinctive higher structure, is ungrammatical, as is (1c), where the distinctive higher structure appears but not the gap. The interrogative pronoun *what* in (1a) is known as a filler, a constituent in a non-argument position with the properties of the gap. But the distinctive higher structure does not always include a filler. English relative clauses may or may not have a filler:

(2) the book [(which) you put \_ on the table]

As we will see below, there are also UD constructions which never have a filler. When there is a filler in a UD construction, it normally has all the properties of the associated gap. Thus, in the following, the filler and the gap are of the same category:

	- b. [PP To whom] did Kim talk \_ (PP)?
	- c. [AP How long] is this piece of string \_ (AP)?
	- d. [AdvP How quickly] did you do it \_ (AdvP)?

They typically match in other respects as well. For example, if they are nominal, they match in number, as the following illustrate:

(4) a. [NP[*sg*] Which student] do you think \_ (NP[*sg*]) knows the answer? b. [NP[*pl*] Which students] do you think \_ (NP[*pl*]) know the answer?

In languages with grammatical gender or morphological case, they also share these properties. In addition to syntactic properties, unbounded dependencies also establish matching of semantic properties: i.e., in (1a), the filler *what* is understood to fill an argument role of *put*, just as an in situ complement would. The term *unbounded* is used here because the gap and the distinctive higher structure with which it is associated can be indefinitely far apart. The following illustrate:

	- b. What did she say she regrets that she put \_ on the table?
	- c. What do you think she says she regrets that she put \_ on the table?

There are, however, some restrictions here commonly referred to as island phenomena. These are discussed by Chaves (2021), Chapter 15 of this volume. There are a few further points that we should make at the outset. We have focused so far

on UD constructions where an obligatory dependent, a subject or complement, is missing. But UDCs are certainly not restricted to subjects and complements. There are examples where the filler has an adjunct role such as (3d) or the following:

$$\text{(6)}\quad \begin{cases} \text{Where} \\ \text{When} \\ \text{How} \\ \text{Why} \end{cases} \text{ did you talk to Lee } \text{--?}$$

There are also UD constructions with no gap at all. Instead they have a so-called *resumptive pronoun* (RP). The following Welsh example with the RP in italics illustrates:

(7) Pa which ddyn man werthodd sell.PAST.3SG Ieuan Ieuan y the ceffyl horse iddo to.3SG.M *fo*? he 'Which man did Ieuan sell the horse to?'

Finally, we should note that there are some cases where filler and gap do not match.

	- b. \* Which won't Lee \_?

In (8a) the filler is a nominal expression, but the gap is a non-finite VP. The *wh*interrogative in (8b) shows that it is not normally possible to have a nominal filler associated with a VP gap, but in (8a) it is fine. We explore the HPSG approach to these matters in the following pages. In Section 2, we outline the basic HPSG approach to UDs. Then in Section 3, we focus on the nature of gaps, i.e. the bottom of the dependency, and in Section 4 we look more closely at the middle of UDs. In Section 5, we consider the top of UDs and highlight the variety of UD constructions. In Section 6, we look at resumptive pronouns. Then, in Section 7, we consider some further aspects of*wh*-interrogatives, including pied-piping and *wh*-in-situ phenomena, Section 8 deals with extraposition, and, in Section 9, we take a look at filler-gap mismatches. Finally in Section 10, we summarise the chapter, followed by an appendix comparing HPSG to SBCG.

# **2 The basic approach**

An analysis of UDs needs an account of gaps, of the structures at the top of UDs, and of the connection between them. Central to the HPSG approach is the feature

#### Robert D. Borsley & Berthold Crysmann

SLASH, occasionally called GAP in some recent works, which provides information about the presence of UD gaps inside a constituent.<sup>1</sup> Much HPSG work assumes the feature geometry in (9), following Pollard & Sag (1994: Chapter 4):

(9) HPSG feature geometry: nonlocal and local features


As this indicates, SLASH is part of the value of the feature NONLOCAL. Its value is a set of *local* feature structures. If we use traditional category labels as abbreviations for local feature structures, we can say that a constituent containing an NP gap is [SLASH {NP}], a constituent containing a PP gap is [SLASH {PP}], and so on.

Turning to gaps, a central question is whether there is a phonologically empty element in the constituent structure or nothing at all. Both positions have been developed within HPSG, but probably the view that there is nothing at all in constituent structure is the more widely assumed position. We will adopt that for now and return to the issues in Section 3. Assuming this position, example (1a), repeated here as (10), will contain a V with just a single complement sister, namely the predicative PP *on the table*.

(10) What did you put \_ on the table?

Because the V node in Figure 1 contains an NP gap, it will be [SLASH {NP}], and so will the constituents that contain it, with the exception of the complete sentence. Thus, we have the schematic structure illustrated in Figure 1.

Obviously, we need to ask what ensures that the SLASH feature plays just the right role here. First, however, we need to say more about gaps.

On the view of gaps we are focusing on here, they are only represented on argument structure, i.e. ARG-ST lists (see Abeillé & Borsley 2021: Section 4.1, Chapter 1 of this volume and Davis, Koenig & Wechsler 2021, Chapter 9 of this volume). Thus, the verb *put* in (10) has a gap in its ARG-ST list and therefore only

<sup>1</sup>The basic approach derives from the earlier Generalised Phrase Structure Grammar (GPSG) framework (Gazdar, Klein, Pullum & Sag 1985) and can be traced back to Gazdar (1981). The feature's name equally derives from this heritage, referring to the GPSG notation whereby X/Y stands for a category X containing a gap of category Y.

Figure 1: Extraction by SLASH feature percolation

a PP in its COMPS list and in constituent structure. Gaps have the feature make up given in (11):

(11) Representation of gaps, according to Pollard & Sag (1994: 161): LOCAL 1 NONLOCAL - SLASH 1 

Thus, *put* in (10) will have an element of this form in an ARG-ST list where 1 is the LOCAL value of an NP.

Returning now to SLASH, a widely assumed approach involves the following assumptions:

	- b. The SLASH value of a phrase is normally the same as that of its head.

We will consider how these ideas are formalised in Section 4. For now we will just discuss their implications for the analysis of (10). Essentially they mean that it has the following more elaborate analysis, as given in Figure 2.

Clause (12a) is responsible for the SLASH values on P and both Vs, while clause (12b) is responsible for the SLASH values on PP, VP, and the lower S. This approach to the distribution of SLASH crucially involves heads and is commonly said to be head-driven.

The lower S in Figures 1 and 2 is the head of the higher S, but they do not have the same value for SLASH. This is because they represent the top of the depen-

Figure 2: Head-driven SLASH feature percolation

dency. If information about gaps were available above the top of the dependency, it would be possible to have another filler higher in the tree, as in (13).

(13) \* What do you wonder what Kim saw \_?

The top of the dependency in Figures 1 and 2 is a head-filler phrase and the constraint on head-filler phrases needs to ensure that the higher S is [SLASH { }]. One might propose the following constraint:<sup>2</sup>

(14) Head-Filler Schema (singleton SLASH set): *head-filler-phrase* ⇒ SLASH {} HD-DTR 1 " COMPS hi SLASH 2 # DTRS -LOCAL 2 , 1 

This says that a head-filler phrase is SLASH { } and has a head daughter which has a saturated COMPS list and has a single local feature structure in its SLASH set and a non-head daughter whose LOCAL value is the local feature structure in the SLASH set of the head. Standardly, however, a slightly more general constraint is assumed along the following lines:

<sup>2</sup>We use shorthands rather than full AVMs. For example SLASH is located under SYNSEM|NONLOC and COMPS under SYNSEM|LOC|CAT. See Abeillé & Borsley (2021: Section 3), Chapter 1 of this volume for details.

$$\begin{array}{ll} \text{(15)} & \text{Head-Filler Schema:}^{3} \\ & \text{head-filter-phrase} \Rightarrow \\ & \begin{bmatrix} \text{SLASH} & \boxed{\Box} \\ \text{HD-DTR} & \boxed{\Box} \end{bmatrix} \begin{bmatrix} \text{COMPS} \ \langle \rangle \\ \text{SLASH} & \begin{bmatrix} \boxed{\Box} \end{bmatrix} \cup \begin{bmatrix} \boxed{\Box} \end{bmatrix} \end{bmatrix} \\ \text{DTRS} & \left\langle \begin{bmatrix} \boxed{\Box} \text{OCAL} \ \boxed{\Box} \end{bmatrix}, \boxed{\Box} \right\rangle \end{array}$$

This allows the SLASH set of the head to contain more than one member and any additional members form the SLASH set of the whole phrase ( 3 ). This is necessary for an example like (16) from Chaves (2012: 473), where indices are used to link fillers and gaps.

 

(16) This is the person who I can't remember [which papers] I sent copies of \_ to \_ .

Examples of this form often seem unacceptable, but this is probably a processing matter, see Chaves (2012: Section 3) for discussion. See also Section 6 for long relativisation with resumption in Hausa or Modern Standard Arabic.

# **3 More on gaps**

We now look more closely at the nature of gaps. The central question here is: what exactly are gaps? We noted in the last section that it has been widely assumed that gaps are only represented in ARG-ST lists, but that some HPSG work assumes that they are empty categories, often called traces. There is a third possibility which might be considered, namely that gaps are represented in ARG-ST lists and in VALENCE lists, i.e. SUBJ and COMPS lists, but not in constituent structures. However, it seems that this position has rarely been considered. One complicating factor is that there seem to be differences between complement gaps and both subject and adjunct gaps. A consequence of this is that the question "what are gaps?" could have different answers for different sorts of gaps, and in fact different answers have sometimes been given.

Complement gaps seem to have had rather more attention than subject or adjunct gaps, perhaps because there are many different kinds of complements, hence many different kinds of complement gaps. We will look first at complement gaps, and in particular, the gap in (1), repeated here as (17).

(17) What did you put \_ on the table?

Probably the most widely assumed position is that gaps are only represented in ARG-ST lists (see Sag 1997: Section 4.1, Bouma, Malouf & Sag 2001: Section 2.2, Ginzburg & Sag 2000: Chapter 5.1 and Sag 2010: 508). On this view, the verb *put* will have the following syntactic properties:

(18) Representation of a slashed verb (traceless):


We ignore the COMPS feature and the issue of what ensures that the verb here has the same SLASH value as the gap. We will discuss the latter in the next section.

The view that gaps are empty categories was a feature of early HPSG work, notably Pollard & Sag (1994: Chapter 4), and it has been assumed in some more recent work, e.g. Levine & Hukari (2006: 191,385), Borsley (2009), Borsley (2013: Section 4.2), and Müller (2004c). On this view, the VP will have the following structure:

Figure 3: Representation of a slashed VP (with trace)

Again we ignore the COMPS feature and how the VP here has the same SLASH value as the gap.

It is not easy to choose between these two approaches. One argument in favour of the first view, advanced, for example, in Bouma et al. (2001: Section 3.5.2), is that it makes it unsurprising that a gap cannot be one conjunct of a coordinate structure, as in the following:

	- b. \* Which of her books did you find [\_ and [a review of \_]]?

It is not obvious why this should be impossible if gaps are empty categories.<sup>4</sup>

A second argument in favour of a traceless approach comes from languages which morphologically treat slashed transitives on a par with intransitives, like Hausa (Crysmann 2005a) or Mauritian Creole French (Henri 2010). In Hausa and Mauritian, verbs morphologically register whether a direct object is realised locally or not: in both languages, a "short" form is used with locally realised direct objects, whereas the long form is used with intransitives as well as in the case of object extraction. Consider the following examples from Hausa, partially adapted from Newman (2000: 632–633):


Hausa verbs are lexically transitive or intransitive, and they are classified into one of seven morphological grades.<sup>7</sup> Intransitives only have a single form (Aform), which is characterised by a long vowel (in grade 1), cf. (20). Transitives,

<sup>4</sup>Coordination is a problem for any empty category, not just the empty categories that represent gaps in some HPSG work. Various empty categories have been proposed in the HPSG literature, most prominently the empty relativiser of Pollard & Sag (1994: Chapter 5). Sag et al. (2003: Section 15.3.5) propose that African American Vernacular English has a phonologically empty form of the copula. This analysis requires some mechanism to prevent this form from appearing as a conjunct. It is likely that a mechanism that can do this will also prevent the empty categories that represent gaps from being conjuncts.

<sup>5</sup>Newman (2000: 632)

<sup>6</sup>Newman (2000: 632)

<sup>7</sup>We restrict discussion here to grade 1, although the syntactic pattern is systematic across grades, only giving rise to different patterns of exponence. See the Hausa grammars by New-

#### Robert D. Borsley & Berthold Crysmann

however, display an alternation depending on the mode of realisation of the direct object: if used intransitively, they pattern with intransitive verbs in using the A-form (long vowel in grade 1), but with an in situ direct object (21a), they obligatorily surface in the C-form (21b), which has a short vowel in grade 1. Once the direct object is extracted, we find the long vowel A-form again, in parallel to the intransitive use of transitives and true intransitives. In sum, the morphology of Hausa treats complement extraction on a par with argument suppression or lexical intransitives, i.e. as if the direct object complement simply were not there. Similar observations appear to hold for Mauritian (Henri 2010: Section 4.2.3). Thus, if nonlocal realisation corresponds to lexical valence reduction, the Hausa (and Mauritian) facts are straightforwardly accounted for, whereas the generalisation would be lost, if gaps were considered phonologically empty syntactic elements.

However, the lexical approach to argument extraction has some possibly nontrivial implications for other lexical sub-theories of HPSG that make crucial reference to valence lists, which includes lexical theories of agreement and case. This is because gaps will not be present on the valence lists of word-level signs. The theory of ergativity proposed by Manning & Sag (1999: Section 5.2) in terms of mapping between ARG-ST and valence lists is actually formulated as constraints on lexemes, since e.g. the linking of the highest argument to the first element on COMPS (ergative subject) needs to be specified independently of whether this argument is realised by a local or a non-local dependency. The same holds of course for the linking of objects in accusative languages.<sup>8</sup>

Similar considerations apply to agreement: if agreement treats local and nonlocal arguments alike, it is clear that agreement controllers cannot be identified in a general fashion in terms of the valence features of word-level signs: thus, if agreement relations need to make reference to valence rather than argument structure, this can only be established at the level of lexemes.<sup>9</sup> The relevant evidence comes from languages, where the highest argument on ARG-ST does not necessarily correspond to the highest grammatical function, i.e. SUBJ valence: while some ergative languages display agreement with the highest argument on

man (2000) and Jaggar (2001) for details, and Crysmann (2005a) for evidence in favour of a morphological treatment.

<sup>8</sup>Crysmann (2009) exploits the fact that extracted arguments do not appear on the valence lists of word-level signs and formulates local case assignment for Nias as a constraint on *word*, effectively exempting topicalised arguments from objective case assignment.

<sup>9</sup>Lexemes are basic lexical items. Lexemes of inflectable parts of speech are mapped to words. See Abeillé & Borsley (2021: Section 4.1), Chapter 1 of this volume for more on the notion of lexeme.

ARG-ST, e.g. Udi (Harris 1984), Archi (Kibrik 1994) shows agreement with the absolutive argument, suggesting that SUBJ is the right place to establish the relation. In Nias (Crysmann 2009), we find agreement with SUBJ in the realis, and with the least oblique argument in the irrealis (ARG-ST). Finally, in Welsh, we observe a parallelism in the agreement between subjects of finite verbs and the objects of prepositions and non-finite verbs: according to Borsley (1989: Section 4), a unified treatment can be given if subjects of finite verbs are the first element on COMPS, an assumption that directly captures Welsh VSO word order.<sup>10</sup>

Given the broad empirical support for valence lists as one of the loci of case and agreement constraints, it is clear that these constraints must hold for lexemes, not words under a traceless, lexical approach to unbounded dependencies.

We turn now to subject gaps. Here a central question is: "how similar or how different are they to complement gaps?" The following illustrate a well-known contrast, which suggests that they may be significantly different:

	- b. Who do you think \_ saw Kim?

The examples in (22) show that a gap is possible in object position in a complement clause whether or not it is introduced by *that*. In contrast, the examples in (23) suggest that a gap is only possible in subject position in a complement clause if it is not introduced by *that*. Pollard & Sag (1994: Chapter 4.4) approach this contrast by stipulating that gaps cannot appear in subject position. This accounts for the ungrammaticality of examples like (23b). Examples like (23a) are allowed by allowing verbs like *think* to take a VP complement and have a non-empty value for SLASH. Ginzburg & Sag (2000: Chapter 5.1.3) offer a very different account, in which subject gaps appear both in ARG-ST list and SUBJ lists. They suggest that examples like (23b) are ungrammatical because *that* cannot combine with a constituent which has a non-empty SUBJ list.

An important fact about subject gaps is that they are not completely impossible in a complement clause introduced by *that*. In particular, they are acceptable if *that* is followed by an adverbial constituent. The following illustrates:

<sup>10</sup>Borsley (2016: Section 5.4) argues on rather different grounds that agreement in the Caucasian language Archi involves constraints on constituent structure, which will favour a trace-based perspective on extraction.

#### Robert D. Borsley & Berthold Crysmann

# (24) Who did you say that tomorrow \_ would regret his words?

Ginzburg & Sag (2000: Chapter 5.1.3) offer an account of such examples, but Levine & Hukari (2006: Chapter 2.3.2) argue that it is unsatisfactory. More generally, they argue that subject gaps are like complement gaps in various respects and therefore should have the same basic analysis. They propose an analysis with an empty category for both types of gap. Thus, their approach differs both from the widely assumed approach, which has no empty categories, and the approach of Pollard and Sag, which has them in complement position but not in subject position.

We turn now to adjunct gaps. It is not obvious that there is a gap in examples like (6) repeated as (25), because no obligatory constituent is missing.<sup>11</sup>

$$\text{(25)}\quad \begin{cases} \text{Where} \\ \text{When} \\ \text{How} \\ \text{Why} \end{cases} \text{ did you talk to Lee } \text{\textquotedblleft}$$

However, Hukari & Levine 1995 show that such examples may display what are often called *extraction path effects*, certain phonological or morphosyntactic phenomena which appear between a gap and the associated higher structure (see the discussion of example (30) on page 550). Hence, it seems that they must involve a filler-gap dependency, on a par with examples with a complement gap.

Of course, there are a variety of positions that are compatible with this conclusion. Bouma et al. (2001: 12) and Ginzburg & Sag (2000: 168, fn. 2) propose that verbal adjuncts are optional extra complements. On this view, the gaps in the examples in (25) are complement gaps. Levine (2003) and Levine & Hukari (2006: Chapter 3.5–3.6) argue against this approach with examples like the following:

(26) In how many seconds flat do you think that [Robin found a chair, sat down, and took off her logging boots]?

This is a query about the total time taken by three distinct events. Levine & Hukari propose a fairly traditional analysis of verbal adjuncts in which they are modifiers of VP, and combine this with the assumption that gaps are empty categories. The interpretation of examples like (26) follows straightforwardly on this analysis. If indeed argument extraction contrasts with adjunct extraction in terms of whether the gap is introduced lexically (on ARG-ST) or phrasally, this

<sup>11</sup>This position has initially been taken in Pollard & Sag (1994: 176–180).

may provide a direct account of the fact that the use of a resumptive strategy in extraction is by and large restricted to arguments. As discussed by Crysmann & Reintges (2014), resumptives are obligatory for arguments in Coptic, whereas gap-type extraction is the only possibility for modifiers.

A rather different approach is developed in Chaves (2009). Like Levine & Hukari (2006: Chapter 3), he assumes that verbal adjuncts are modifiers of VP, but he rejects the idea that gaps are empty categories. He shows in particular that the possibility for a filler to correspond to a group is neither limited to adjunct extraction nor to events, but may also be observed with NP complements whose gaps are properly contained within each conjunct, as shown by the following examples:

	- b. [[Which pilot]*<sup>i</sup>* and [which sailor]*j*] will Joan invite \_ *<sup>i</sup>* and Greta entertain \_ *<sup>j</sup>* (respectively)?

He suggests that the treatment of coordination must be relaxed in such a way as to permit the creation of group individuals and group events on the mother's SLASH where the daughters' SLASH values contain the individual or event variables of the group's members. This provides an account of complement extraction as in (27), but it also provides a straightforward account of the cumulative scoping facts in (26).

# **4 The middle of the dependency**

In the middle of an unbounded dependency we typically have a phrase (or a clause) with the same value for SLASH as a non-head daughter. As we noted in Section 2, it is widely assumed that this relation is mediated by the head daughter. The SLASH value of a head is normally the same as that of its arguments, and the SLASH value of a phrase is normally the same as that of its head. However, as we will see, this head-driven approach to the distribution of SLASH hasn't always been adopted.

Central to the head-driven approach is the SLASH Amalgamation Principle, which we can formulate as follows, following Ginzburg & Sag (2000: 199):

(28) SLASH Amalgamation Principle:

$$
\begin{array}{l}
\text{word} \stackrel{\circ}{\Rightarrow} \Big( \begin{array}{l}
\circ\\ \text{SYNSEM} \left[ \text{NONLOC} \left[ \text{SLASH} \left[ \box{\text{SLASH}} \left[ \box{\text{L}} \cup \dots \cup \box{\text{m}} \right] \right] \right] \\
\text{ARG-ST} \left\{ \left[ \text{NONLOC} \left[ \text{SLASH} \left[ \box{\text{L}} \right] \dots , \left[ \text{NONLOC} \left[ \text{SLASH} \left[ \box{\text{L}} \right] \right] \right] \right) \right] \end{array} \right\} \end{array}
$$

#### Robert D. Borsley & Berthold Crysmann

This is a default constraint, as indicated by the '/'. Essentially, it says that by default the SLASH value of a word is the union of the SLASH values of its arguments. Being merely a default constraint will accommodate examples like the following:

(29) The professor is hard [to talk to \_].

Here, the adjective *hard* takes an infinitival complement with a non-empty SLASH feature but this SLASH feature is not passed on any further, but rather coindexed with the subject of the adjective.<sup>12</sup>

To ensure that the SLASH value of a phrase is normally the same as that of its head, much work employs a Slash Inheritance Principle, which stipulates that a phrase and its head have the same value for SLASH except at the top of a dependency (see, e.g. Bouma et al. 2001: 20). An alternative approach developed in Ginzburg & Sag (2000: Chapter 5.1) uses the Generalised Head Feature Principle for this purpose.<sup>13</sup> This says that a headed phrase and its head daughter have the same SYNSEM values unless some other constraint requires something different. Among other things, this ensures that a headed phrase and its head daughter normally have the same value for SLASH.

One argument in favour of a head-driven approach to the distribution of SLASH is so-called extraction path effects, certain phonological or morphosyntactic phenomena which appear between a gap and the associated higher structure (see Hukari & Levine 1995; Bouma et al. 2001: Section 3.2). Irish provides one of many examples that have been discussed. In Irish, the verbal particle *goN* only occurs with structures that do not contain gaps, while *aL* only occurs between a filler and a gap. The following illustrate (Bouma et al. 2001: 26):<sup>14</sup>

	- b. an the fear man aL PRT shíl thought mé I aL PRT bheadh would.be ann there 'the man that I thought would be here'

<sup>12</sup>The non-local nature of *tough*-constructions appears to be a peculiarity of English: similar constructions in German and French do exist, but they feature local (passive-like) dependencies. See Abeillé et al. (1998) and Aguila-Multner (2018) for French, as well as Müller (2002: Section 3.1.5) for German. Even for English, the unboundedness of the construction has been challenged: Grover (1995) questions the acceptability of English *tough*-constructions involving a UDC out of finite clauses and suggests a local account instead.

<sup>13</sup>See also Abeillé & Borsley (2021: 24), Chapter 1 of this volume for an explicit formulation of the constraint.

<sup>14</sup>In some accounts these particles are taken to be complementisers. The N indicates that *go* triggers nasal mutation while L indicates that *a* triggers lenition.

Within a head-driven approach to SLASH, this is just a contrast between a verb which is [SLASH { }] and a verb which is [SLASH {[ ]}], and is completely unproblematic.

Early HPSG assumed an approach to SLASH which was not head-driven (see Pollard & Sag 1994: Chapter 4), and related approaches are assumed in Levine & Hukari (2006) and Chaves (2012: 497). A problem with a head-driven approach is that it says nothing about examples where an unbounded dependency crosses the boundary of a non-headed phrase such as a coordinate structure. Thus, it does not deal with examples of asymmetric coordination like the following:

	- b. How many lakes can we [[destroy \_] and [not arouse public antipathy]]?

Early HPSG (Pollard & Sag 1994: Chapter 4) accounts for the distribution of SLASH by means of the Nonlocal Feature Principle, and related principles are proposed by Levine & Hukari (2006: 354) and Chaves (2012: 497). These principles ensure that the SLASH value of a phrase reflects the SLASH values of all its daughters (using set union) and apply equally to headed and non-headed structures. Thus, the examples in (31) are no problem for these latter approaches. However, they seem to require some extra element to handle extraction path effects. So, it is not easy to choose between these approaches and the head-driven approach.

A further point that we should emphasise here is that both approaches to the distribution of SLASH allow structures like the one in Figure 4.

Figure 4: Across-the-board (ATB) extraction: conflation of SLASH values

In other words, both allow more than one daughter of a phrase with a non-empty SLASH value to have the same value. This means that we expect structures in which a single filler is associated with more than one gap. Thus, examples like the following are no problem:

	- b. Which person did you [invite \_ [without thinking \_ would actually come]]?

Example (32a), where the two gaps are in a coordinate structure is standardly said to be a case of across-the-board extraction (Ross 1967; Williams 1978). (32b) is traditionally seen as involving an ordinary gap followed by a parasitic gap. However, for HPSG, all these gaps have essentially the same status (see Levine & Hukari 2006 and Chaves 2012 for extensive discussion).

# **5 The top of the dependency: The diversity of unbounded dependency constructions**

We now look more closely at the top of unbounded dependencies. This is where most of the diversity of unbounded dependency constructions resides. They are largely the same at the bottom of the dependency and in the middle, but at the top of the dependency, they differ from each other in a variety of ways. We noted at the outset that the distinctive higher structure in an unbounded dependency construction may contain a filler, but does not always. In other words, it may be a head-filler phrase, but it may not, and there are a number of other possibilities. Moreover, head-filler phrases can have quite different properties in different constructions.

In the introduction to this chapter we mentioned *wh*-interrogatives and relative clauses<sup>15</sup> as two examples of unbounded dependency constructions. In English the former always involve a head-filler-phrase,<sup>16</sup> while the later sometimes do but sometimes do not. There are *wh*-relatives and non-*wh*-relatives of various kinds. English *wh*-interrogatives and *wh*-relatives look quite similar. They seem to involve many of the same lexical items: *who*, *which*, *when*, *where*, *why*, and, as the following show, both may be finite or non-finite:

	- b. I wondered [who to talk to \_].
	- b. someone [to whom to talk \_]

However, for other work, this is a filler just like the *wh*-elements discussed here.

<sup>15</sup>See also Arnold & Godard (2021), Chapter 14 of this volume for a more detailed discussion of relative clauses.

<sup>16</sup>On some analyses of examples like the following, *who* is just a subject and not a filler:

<sup>(</sup>i) Who knows the answer?

But there are differences. *Wh*-interrogatives, but not *wh*-relatives, allow *what* and *how*:


In *wh*-interrogatives, *which* combines with a following nominal except in cases of ellipsis. Thus, in (37), *book* is necessary unless it is clear that books are under discussion.

(37) Which book did Kim buy \_?

Notice also that non-finite *wh*-relatives only allow a PP as a filler. Thus, (38) is not possible as an alternative to (34b).

```
(38) * someone [who to talk to _]
```
Thus, the fillers in the two constructions differ in a number of ways. The heads also differ in that*wh*-interrogatives have auxiliary + subject order in main clauses (unless the *wh*-phrase is the subject), something which does not occur in *wh*relatives.

*Wh*-interrogatives and *wh*-relatives are not the only unbounded dependency constructions that involve a head-filler phrase. Topicalisation sentences such as the following are another:

	- b. To London, I went \_.

Unlike *wh*-interrogatives and *wh*-relatives, these are always finite. Also required to be finite are what have been called *the*-clauses (Borsley 2004; Sag 2010: 490– 494, 524–527; Borsley 2011; Abeillé & Chaves 2021: Section 3.3, Chapter 16 of this volume), the components of comparative correlatives such as (40).

(40) The more I read \_, the more I understand \_.

*The*-clauses have the unusual property that they may contain the complementiser *that*:

(41) The more that I read \_, the more that I understand \_.

Obviously, this is not possible in *wh*-interrogatives and *wh*-relatives.

(42) a. \* I wonder [who that Lee saw \_].

b. \* the man [who that Lee saw \_]

Within HPSG the obvious approach to the sorts of facts we have just highlighted involves a number of subtypes of the type *head-filler-phrase*, as in Figure 5.

> *head-filler-phrase wh-interr-cl wh-rel-cl top-cl the-cl*

Figure 5: Hierarchy of head-filler phrases

As was noted in Abeillé & Borsley 2021, Chapter 1 of this volume, much HPSG work assumes two distinct sets of phrase types. Assuming this position, *whinterr-cl* will not just be a subtype of *head-filler-ph(rase)* but also a subtype of *interr-cl*, the type *wh-rel-cl* will also be a subtype of *rel-cl*, and *top-cl* and *the-cl* will both be subtypes of *decl-cl*. This gives the type hierarchy in Figure 6.

Figure 6: Hierarchy of extraction clause types (preliminary)

Constraints on *interr-cl* will capture the properties that all interrogatives share, most obviously interrogative semantics. Constraints on *rel-cl* will capture what all relatives have in common, especially modifying an appropriate nominal constituent.17.Finally, constraints on *decl-cl* will capture the properties on declaratives, especially declarative semantics. Constraints on *wh-interr-cl* and *wh-rel-cl* will ensure that their fillers take the appropriate form. Constraints on *top-cl* and *the-cl* will restrict their fillers and also require their heads to be finite. Further complexity is probably necessary to handle all the facts noted above. To ensure that non-finite *wh*-relatives only allow a PP filler while finite *wh*-relatives allow

<sup>17</sup>Non-restrictive relatives can also modify various kinds of non-nominal constituents. See Arnold (2004), Arnold & Borsley (2008), and Arnold & Godard (2021: Section 3.4), Chapter 14 of this volume

either an NP or a PP filler, it is probably necessary to postulate two subtypes of *wh-rel-cl*. As for the fact that *the*-clauses may contain the complementiser *that*, one way to deal with this is to postulate a subtype of *head-filler-phrase*, *standardhead-filler-phrase*, with *wh-interr-cl*, *wh-rel-cl*, and *top-cl* as its subtypes. This new type will be subject to a constraint preventing its head from containing a complementiser. The type *the-cl* will not be a subtype of this new type and hence will be able to contain a complementiser (see Borsley 2011: 13–15 for discussion). All this suggests the type hierarchy in Figure 7. This is complex, but

Figure 7: Hierarchy of extraction clause types (final)

then the facts are complex, as we have seen. Crucially, such a hierarchy allows a straightforward account of both the similarities and the differences among these constructions.

We turn now to cases where there is no filler. We start with the so-called *tough* construction, exemplified by (29), repeated here as (43).

(43) The professor is hard [to talk to \_].

Here, there is a gap following the preposition *to*, and the initial NP *the professor* is understood as the object of *to*. But this NP is not a filler, but a subject. Like any subject, it is preceded by an auxiliary in an interrogative:

(44) Is the professor hard [to talk to \_]?

Moreover, it is clear that it cannot share a local feature structure with the gap, since it is in a position associated with nominative case, whereas the gap is in a position associated with accusative case. This suggests that adjectives like *hard* may take an infinitival complement with a SLASH value containing a nominal local feature structure which is coindexed with its subject. The coindexing will

#### Robert D. Borsley & Berthold Crysmann

ensure that the subject has the right interpretation without getting into difficulties over case. It seems, then, that we need something like the lexical description in (45) in order to account for *hard* in examples like (43) and (44):

(45) Lexical representation of *tough* adjectives (preliminary):


But there is more to be said here. *Hard* and its infinitival complement are the top of a dependency. It is essential that the AP *hard to talk to* should not have the same SLASH value as the infinitival complement *to talk to*. How this should be prevented depends on what approach to the distribution of SLASH values is assumed. However, if this involves a default SLASH Amalgamation Principle of the kind discussed in Section 4, it is a fairly simple matter. A default SLASH Amalgamation Principle ensures that the SLASH value of a word is normally the same as the SLASH value of its arguments. We can override the principle in the present case by giving adjectives like *hard* lexical descriptions of the following form:

(46) Lexical representation of *tough* adjectives (final):

This ensures that the SLASH value of such adjectives is the SLASH value of the infinitival complement minus the NP that is coindexed with its subject. Where this NP is the only item in the complement's SLASH value, the adjective will be [SLASH { }], and so will the AP that it heads. However, it is possible to have an additional item in the SLASH value, as in the following example, adapted from Pollard & Sag (1994: 169):

(47) Which violin is this sonata [easy to play \_ on \_]?

Here, *which violin* is understood as the object of *on* and *this sonata* as the object of *play*. The infinitival complement *to play on* will have two items in its SLASH set, one associated with *which violin* and one associated with *this sonata*. The constraint in (46) will ensure that only the former appears in the SLASH set of *easy*, and hence only this appears in the SLASH set of *easy to play on*.

The term "lexical binding of SLASH" is often applied to situations like this in which a lexical item makes some structure the top of a dependency. This is a plausible approach to adjectives like *hard* and also to adjectives modified by *too* or *enough*, as in the following:

	- b. Lee is important enough for you to talk to.

Lexical binding is also a plausible approach to relative clauses which have not a filler, but a complementiser. This may include English *that* relatives such as that in (49) (although some HPSG work, e.g. Sag 1997: Section 5.4, has analysed *that* as a relative pronoun and hence a filler):

(49) the man [that you talked to \_]

If relative *that* is a complementiser, and complementisers, are heads, as in much HPSG work, it can be given a lexical description like the one in (50):

(50) Lexical representation of relative complementiser *that*:


This says that *that* takes a finite clause as its complement and modifies an NP, that the SLASH value of the clause includes an NP which is coindexed with the antecedent noun selected via MOD, and that any additional members of the complement's SLASH set form the SLASH set of *that*. Normally there will be no other members and *that* will be [SLASH {}].<sup>18</sup>

Further issues arise with zero relatives, which contain neither a filler nor a complementiser, such as the following English example:

(51) the man [you talked to \_]

For Sag (1997: Section 6), these are one type of non-*wh*-relative and are required to have a MOD value coindexed with an NP in the SLASH value of the head daughter. But an issue arises about semantics. Assuming the main verb in a zero relative has the same semantic interpretation as elsewhere, a zero relative will have

<sup>18</sup>This is essentially the approach that is taken to relatives in Modern Standard Arabic in Alqurashi & Borsley (2012).

#### Robert D. Borsley & Berthold Crysmann

clausal semantics and not the modifier semantics that one might think is necessary for a nominal modifier. Sag's solution is to propose a special subtype of *head-adjunct-phrase* called *head-relative-phrase*, which allows a relative clause with clausal semantics to combine with a nominal and be interpreted in the right way. One might well wonder how satisfactory this approach is.

Sag (2010: Section 5.4) shows that it is a simple matter to assign modifier semantics to a relative clause where the basic clause is the daughter of some other element, as it is when there is a filler or a complementiser. The basic clause can have clausal semantics, and the mother can have modifier semantics. This suggests that zero relatives, too, might be analysed as daughters of another element with modifier semantics. One might do this, as Sag (2010: 531) notes, with a special unary branching phrase type (Müller 1999b: Section 10.3.2). Alternatively, one might postulate a phonologically null counterpart of relative *that*. 19

There are various other issues about the top of the dependency. Consider, for example, cleft sentences such as (52).

(52) It was on the table that he placed the book \_.

Clefts consist of *it*, a form of *be*, a focused constituent, and a clause with a gap. In (52) the focused constituent is a PP and so is the gap. It looks, then, as if the focused constituent shares its main properties with the gap in the way that a filler would. However, there are also clefts where it is clear that the focused constituent does not share an index with the gap. Consider e.g. the following:

(53) It's me that \_ likes beer.

Here the focused constituent is first person singular, but the gap is third person singular, as shown by the form of the following verb. Given the standard assumption that person, number and gender features are a property of indices, it follows that they cannot have the same index. There are important challenges here.

Agreement in German may shed some more light on this:

(54) a. Da there habe have.1SG ich, I der who.SG.M / die who.SG.F sonst otherwise immer always rechtzeitig on.time kommt, come.3SG doch indeed tatsächlich verily verschlafen. overslept 'I, who is otherwise always on time, have indeed overslept.'

<sup>19</sup>This is the approach that is taken to zero relatives in Modern Standard Arabic in Alqurashi & Borsley (2012: Section 4).

b. Da there habe have.1SG ich, I der who.SG.M ich I sonst otherwise immer always rechtzeitig on.time komme, come.1SG doch indeed tatsächlich verily verschlafen. overslept 'I, who is otherwise always on time, have indeed overslept.'

In (54a), we find a reduced agreement pattern in number and gender between the relative pronoun and the antecedent noun, to the exclusion of person. Within the relative clause, however, we find full person/number subject agreement on the verb. In (54b), however, the relative pronoun is post-modified by the pronoun *ich* 'I', triggering full agreement with both the antecedent noun and the embedded verb. French, by contrast, observes full agreement of all three INDEX features:

(55) C'est it's moi me qui who suis am venu(e). come.M/F 'It's me who came.'

Thus, relative pronouns and complementisers seem to differ cross-linguistically as to the features which show agreement with the antecedent.

Also quite challenging are free relatives. They look rather like head-filler phrases. The initial constituent of a free relative behaves like a filler, reflecting the properties of the gap.

	- b. whichever students you think know/\*knows the answer

But the initial constituent also behaves like a head, determining the distribution of the free relative.

	- b. \* Kim will go what(ever) Lee buys.

In case languages like German, the matching effect generally includes case specifications (Müller 1999a).

Most work on free relatives has assumed that the initial constituent is a filler and not a head (Groos & van Riemsdijk 1981; Grosu 2003) or a head and not a filler (Bresnan & Grimshaw 1978). But the obvious suggestion is that it is both a filler Robert D. Borsley & Berthold Crysmann

and a head, a position espoused in Huddleston & Pullum (2002: Chapter 12.6). This idea can be implemented within HPSG by analysing free relatives and headfiller phrases as subtypes of *filler-phrase*, as shown in Figure 8. See Borsley (2020) for an application of this approach to Welsh.

*filler-phrase head-filler-phrase free-relative*

Figure 8: Hierarchy of filler phrases

*filler-phrase* will be subject to a constraint like that proposed earlier for headfiller phrases except that it will say nothing about the head-daughter. *head-fillerphrase* will be subject to a constraint identifying the second daughter as the head, while *free-relative* will be subject to a constraint identifying the first daughter as the head (among other things).<sup>20</sup>

Naturally, there may be complications here. German, for example, has some free relatives in which the case of the *wh*-element differs from that which the position of the free relative leads one to expect: e.g. free relatives with a dative or PP filler can be used in contexts where a less oblique argument is required, like a nominative or accusative NP (Bausewein 1991: Section 3). This looks like a problem for the idea that the initial constituent is a head, but it may not be if we adopt the Generalised Head Feature Principle of Ginzburg & Sag (2000: 33) and regard the difference in HEAD and/or CASE values between head daughter and mother as specific overrides enforced by the free-relative rule.<sup>21</sup>

# **6 Resumptive pronouns**

Ever since Vaillette (2001), resumption has been treated as an unbounded dependency within HPSG, on a par with SLASH dependencies, rather than as a case of anaphoric binding. The main motivation for treating resumption similar to extraction lies with the fact that in a variety of languages dependencies involving a

<sup>20</sup>The constraint on free relatives will also need to ensure that the first daughter takes the appropriate form and that the second daughter is finite.

<sup>21</sup>Müller (1999a) pursues a rather different approach to German free relatives, in which the initial constituent is not a head. Differences between the initial constituent and the free relative are unproblematic for this approach, but it needs a mechanism to account for the similarities between them.

pronominal at the bottom of the dependency behave similarly to UDCs involving a gap at the extraction site.

Vaillette (2001) investigates resumption in Hebrew and shows on the basis of across-the-board (ATB) extraction, parasitic gaps, and crossover that resumptive dependencies are indistinguishable from gap dependencies except for their reduced sensitivity to extraction islands. In order to reconcile the UDC-like properties of resumption with the difference in island sensitivity, he introduces a dedicated non-local feature RESUMP. While using separate features for resumptive pronouns and gaps easily makes them distinguishable for the purposes of island constraints, it certainly has the drawback that formulation of the ATB constraint becomes quite cumbersome. The following example illustrates mixing of gaps and resumptives in ATB extraction in Hebrew:


Subsequent work on Persian (Taghvaipour 2005), Hausa (Crysmann 2012), and Welsh (Borsley 2013) essentially follows Vaillette, using ATB extraction as the main indicator for treating resumptive dependencies in a similar way to gap dependencies. What all these works have in common is that they rely on a single non-local feature, namely SLASH for both types of dependencies. In particular, these authors argue that mixing of strategies, as illustrated in (59) for Hebrew and in (63) for Hausa, suggests that both extraction strategies should be captured using a single non-local feature, i.e. SLASH. Despite this commonality, however, approaches differ as to how gap and resumptive dependencies are distinguished, if at all.

In his work on Welsh unbounded dependencies, Borsley (2010) observes that the choice between gap and resumptive pronoun is essentially determined by properties of the immediate environment of the bottom of the dependency: i.e. while possessors of nouns and complements of prepositions require a resumptive element when extracted, subjects, as well as direct objects of finite and non-finite verbs, only extract by means of filler-gap dependencies. Thus, the distribution of gaps vs. resumptives is practically disjoint.

Furthermore, he reports evidence that resumptives and gaps also pattern alike with respect to island constraints: while extraction out of the clausal comple-

<sup>22</sup>Sells (1984: 78)

#### Robert D. Borsley & Berthold Crysmann

ment in a complex NP is fine, with either a gap or a resumptive at the bottom, extraction out of a relative clause leads to ungrammaticality, again, independent of whether we find a gap or a resumptive.<sup>23</sup>

(60) a. Dyma here.is 'r the dyn man y PRT credodd believe.PAST.3SG Dafydd Dafydd [y the si rumour [y PRT gwelodd see.PAST.3SG Mair Mair (o)]]. he (Welsh) 'Here's the man who David believed the rumour that Mair saw.' b. Dyma here.is 'r the dyn man y PRT credodd believe.PAST.3SG Dafydd Dafydd [y the si rumour [y PRT cest get.PAST.2SG ti you 'r the llythyr letter 'na DEM ganddo with.3SG.M (fo)]]. him 'Here's the man who David believed the rumour that you got that letter from.' c. \* Dyma that.is 'r the ffenest window darais hit.PAST.1SG i I ['r the bachgen boy [dorrodd break.PAST.3SG (hi) she ddoe]].

yesterday

Moreover, with respect to the across-the-board (ATB) constraint, resumptives and gaps show the same behaviour as observed for Hebrew, easily permitting mixing. In addition, Welsh also has certain extraction path effects which are the same in gap and resumptive dependencies (see Borsley 2010 for details).

Given that the distribution of gaps and resumptives is regulated by the locally selecting head at the bottom of the dependency and that there is no need to distinguish the two types of dependencies along the extraction path (middle), Borsley (2010: 97) formulates what is probably the most simple and straightforward approach to resumption. In essence, he proposes "that we need structures in which a slashed preposition or noun has not a slashed argument but a pronominal argument coindexed with its SLASH value". Consequently, he extends Slash Amalgamation to optionally include a SLASH element coindexed with an unslashed pronominal argument. This move licenses Welsh resumptives in a structure like the one in Figure 9 below.

Thus, the only difference between gaps and resumptives on his account is that the former give rise to a reentrancy of an element in SLASH with a LOCAL value on ARG-ST, whereas the latter merely involve reentrancy of INDEX values (between an NP *local* on SLASH and an NP *synsem* on ARG-ST).

<sup>23</sup>The examples are from Borsley (2010: 91–92).

Figure 9: Representation of Welsh resumptives

The respective distribution of gaps and resumptives are finally accounted for by means of constraints on the binding theoretical status of the element at the bottom of the the dependency, i.e. *ppro* for resumptives and *npro* for gaps. See Müller (2021a), Chapter 20 of this volume on Binding Theory in HPSG.

Borsley's decision to locate the resumptive function on the selecting head, rather than on the pronominal, not only provides a good match for the Welsh data, but it also addresses McCloskey's generalisation (McCloskey 2002: 192) that resumptives are always the ordinary pronouns, since no lexical ambiguity between slashed and unslashed pronouns is involved.<sup>24</sup>

In contrast to Borsley, who developed his theory of resumption on the basis of a language where the distribution of gaps vs. resumptives is entirely regulated by the immediate local environment and no difference in island sensitivity could be observed, Crysmann (2012) developed an alternative account for Hausa, a language where the distributions of gaps and resumptives partially overlap at the bottom of the dependency and where resumptive dependencies observe different locality constraints when compared to filler-gap dependencies.

Hausa patterns with a number of resumptive languages, including Welsh, in that use of a resumptive element is obligatory for complements of a preposition or the possessor of a noun. With direct and indirect objects, however, both resumptives and gaps are possible, as Jaggar's (2001: 534) examples in (61) show:

(61) a. mutā̀nên men dà REL sukà 3.P.CPL ƙi refuse sayar sell wà to \_ dà with àbinci food sukà 3.P.CPL fìta left (Hausa) 'The men they refused to sell food to left.'

<sup>24</sup>Cf. e.g. Abeillé & Godard (2007: 54–55) for an ambiguity approach, treating reentrancy of LOCAL and SLASH as optional for French pronouns.

#### Robert D. Borsley & Berthold Crysmann

b. mutā̀nên men dà REL sukà 3.P.CPL ƙi refuse sayar sell *musù* to.them dà with àbinci food sukà 3.P.CPL fìta left 'The men they refused to sell food to left.'

In (61), both a bare dative marker *wà* 'to' is possible (with a gap), and a dative pronoun *musù* 'to.them'.

Moreover, gap and resumptive dependencies do behave differently with respect to strong islands: while extraction out of a relative clause or *wh*-island is impossible for gap dependencies, relativisation out of these islands is perfectly fine with resumptives.

(62) Gā̀ here.is tābōbîn cigarettes dà REL Àli Ali ya 3.S.M.CPL san know mùtumìn man dà REL zâi 3.S.M.FUT yī do *musù* to.them / \*wà \_ to kwālī<sup>25</sup> box (Hausa) 'Here are the cigarettes that Ali knows the man that (he) will make a box for.'

Crysmann (2012) further emphasises that relativisation (which may escape strong islands) resembles anaphoric relations, whereas filler-gap dependencies, as observed with *wh*-fronting, require matching of category as well. He therefore correlates relative complementisers and resumptives with minimal INDEX sharing, whereas filler-head structures, as well as gaps will require sharing of entire LOCAL values: while filler-head structures impose this stricter constraint at the top of the dependency, gaps obviously do so at the bottom. In order to express constraints on locality, Crysmann (2012; 2016) proposes that SLASH elements (of type *local*) should be distinguished as to their weight, cf. Figure 10: while the type *local* always minimally includes indexical information, its subtypes*full-local* and *weak-local* differ as to the amount of additional information that must or must not be present. For *full-local*, which is the appropriate value introduced by *synsem* (cf. Figure 11), this includes categorial and full semantic information, whereas exactly categorial information is excluded for *weak-local*.

The hierarchy of *local* types provides for the possibility that *local* types on SLASH may only be partially specified: while gaps and filler-head structure require full reentrancy of a (*full-local*) LOCAL value, resumptives may be non-committal with respect to the weight distinction, only imposing the minimal indexsharing constraint. This ensures that both resumptives and gaps can be found at

<sup>25</sup>Tuller (1986: 84)

Figure 10: Hierarchy of *local* (Crysmann 2016: 202)

Figure 11: Hierarchy of *synsem* objects (Crysmann 2016: 202)

the bottom of a strong UDC with e.g. a *wh*-filler. Conversely, islands can narrow down the nature of SLASH elements to only pass on a SLASH set of *weak-local*, such that resumptives, but not gaps, will be licensed at the bottom in the case of long relativisation.

Underspecification of *local* at the bottom of a resumptive dependency permits mixing of gap and resumptive strategies in ATB extraction, as illustrated by the example below:

(63) [àbōkī-n-ā] friend-L-1.S.GEN dà REL [[na 1.S.CPL zìyartā̀ visit \_] àmmā but [bàn 1.S.NEG.CPL sā̀mē find *shì* 3.S.M.DO à at gidā home ba]]<sup>26</sup> NEG (Hausa) 'my friend that I visited but did not find at home'

<sup>26</sup>Newman (2000: 539)

#### Robert D. Borsley & Berthold Crysmann

The obvious question is, of course, how these two approaches can be harmonised in order to yield a unified HPSG theory of resumption. It is clear that the theory advanced by Crysmann (2012) makes a more fine-grained distinction with regard to SLASH elements and should therefore be able to trivially account for languages where there is no difference in locality restrictions between resumptive and gap dependencies. In the case of Welsh, it will suffice to strengthen the constraints of strong islands, such as relative clauses, to block passing of any *local* on SLASH, rather than merely restricting it to *weak-local*. The other area where the theories need to be brought closer together concerns the issue of McCloskey's generalisation, which is straightforwardly derived by a syntactic theory of resumption, such as Borsley's. Some work in this direction has already been done: Crysmann (2016) suggests replacing his original ambiguity approach with an underspecification approach, essentially following Borsley (2010) in locating the disambiguation between pronoun and resumptive function on the selecting head. While there are still differences of implementation, general agreement has been obtained that it should indeed be the head that decides on the pronominal's function, whether this is done via disjunctively amalgamating the index of a pronominal argument (Borsley 2010; Alotaibi & Borsley 2013), or else via a more elaborate system of *synsem* types that integrates more nicely with standard SLASH amalgamation (Crysmann 2016).

Similar consensus has been reached with respect to the need to have more finegrained control on locality, again irrespective of implementation details: while Alotaibi & Borsley (2013) exploited constraints on case marking in order to capture the difference in locality of resumptives and gaps in Modern Standard Arabic, the weight-based analysis by Crysmann (2017) provides a more principled account of the data, essentially obviating stipulative nominative case assignment that fails to correspond to any overtly observable case marking.

Some questions still remain: Taghvaipour (2005: Section 6.5) suggests that in Persian, the distribution of gaps vs. resumptives is partly determined by the constructional properties of the top of the dependency, showing different patterns for *wh*-extraction, free relatives and ordinary relatives, and suggests that constructional properties of the top need to be transmitted via SLASH. However, percolation of constructional information across the tree does not play nicely with basic assumptions of locality within HPSG. It remains to be seen how the case of Persian can be analysed within the scope of the theories outlined above.

Another case study that deserves integration into the current HPSG theory of resumption concerns so-called hybrid chains in Irish (Assmann et al. 2010): in this language, the most deeply embedded complementisers register the difference

between gaps and resumptives at the bottom, yet complementisers further up can switch between "resumptive marking" and "gap marking". While the authors use a single SLASH feature for both types of dependency, the objects in this set remain incompatible, thereby necessitating a great deal of disjunction. In order to bring this analysis fully in line with current HPSG, underspecification techniques may be fruitfully explored.

# **7 More on** *wh***-interrogatives**

# **7.1 Pied piping**

So far, we have concentrated on unbounded dependencies as witnessed by extraction, captured in HPSG by SLASH feature inheritance. Another type of unbounded dependency involves pied-piping, as illustrated in (64b–d) and (65b–d), taken from Ginzburg & Sag (2000: 184).

	- b. I wonder [[*whose* cousin] ate the pastry].
	- c. I wonder [[*whose* cousin's dog] ate the pastry].
	- d. I wonder [[to *whom*] they dedicated the building]
	- b. the person [[*whose* cousin] ate the pastry]
	- c. the person [[*whose* cousin's dog] ate the pastry]
	- d. the person [[to *whom*] they dedicated the building]

In (64) the *wh*-word, a pronoun or determiner, that marks the (embedded) *wh*interrogative clause may be arbitrarily deeply embedded inside the filler.

With relative clauses too, as witnessed by (65), the relative pronoun may be embedded inside the filler, and, again, arbitrarily deep. Furthermore, regardless of the level of embedding, the relative pronoun is coreferent with the antecedent noun, such that a mechanism is called for that can establish this token identity in a non-local fashion. This is most evident in languages where relative pronouns undergo agreement with the antecedent noun, as e.g. in German:

(66) a. das DEF.N.S Buch, book(N).SG [*das* REL.N.SG mich me inspirierte] inspired (German) 'the book that inspired me'


'the book the review of which I liked'

d. die DEF.F.SG Autorin, author(F).SG [[*deren* REL.F.SG.POSS / \**dessen* REL.M.SG.POSS Roman] novel(M) mir me gefiel] pleased

'the (female) author whose novel I liked'

In order to capture the fact that the filler of a *wh*-clause must contain a *wh*-word, or that the relative pronoun contained within the filler of a relative clause must structure-share its INDEX with the antecedent noun, HPSG builds on previous work in GPSG (Gazdar et al. 1985: Chapter 5.2), postulating the non-local features qUE/WH and REL. Pollard & Sag (1994: 164) have proposed a single Nonlocal Feature Principle that generalises from SLASH feature percolation to inheritance of qUE and REL, defining the value of each non-local feature of the mother as the set union of the nonlocal features of the daughters. See, however, Sag (1997: Section 4.2) and Ginzburg & Sag (2000: Chapter 7) for a head-driven formulation of nonlocal feature percolation.

One observation regarding pied piping in languages such as English or German pertains to the fact that *wh*-words tend to surface in the left periphery of the filler, e.g. (67a). Ginzburg & Sag (2000: 194, fn. 26) suggest that amalgamation of qUE/WH is restricted to the least oblique element on ARG-ST. This enables them to rule out (67b) while still being able to account for standard pied-piping with prepositional phrases (64d).

	- b. \* I wonder [[my picture of whom] was on display].

Indeed, from a cross-linguistic perspective, pied-piping of prepositions appears to be the far less marked option when compared to preposition stranding, which appears to be a peculiarity of English (cf. van Riemsdijk 1978). This is supported not only by the ban on preposition stranding in German, French, and many other languages, but it is also corroborated by the distribution of resumptives (see Section 6).

To summarise, pied piping in HPSG is understood as a phenomenon that involves a second unbounded dependency: in addition to a SLASH dependency between the pied-piped filler and the extraction site, just like the ones we have discussed throughout this chapter, qUE or REL establish dependencies within the filler itself.<sup>27</sup>

# **7.2 Multiple** *wh***-questions**

While in languages such as English, only one *wh*-phrase may be fronted per interrogative clause (and typically one phrase is indeed fronted), it is nevertheless possible to ask multiple questions, with additional *wh*-phrases remaining in situ, as witnessed by *what* in (68).

(68) Who asked who saw what?

According to the theory of Ginzburg & Sag (2000), only fillers in interrogative clauses are *wh*-marked, and *wh*-marking serves to ensure that a *wh*-quantifier contained in the filler is interpreted as a parameter of the local interrogative clause. In situ *wh*-phrases, by contrast, are still quantifiers, so they may scope higher than their syntactic position suggests. Ginzburg & Sag (2000: Section 5.3) follow Pollard & Sag (1994: Section 8.2) in adopting a Cooper storage, which enables them to have the in situ *wh*-quantifier in (68) retrieved either as a parameter of the embedded interrogative clause, or as a parameter of the matrix question. The WH feature thus not only ensures that a *wh*-interrogative is marked as such by a filler containing a *wh*-word, but it also fixes the semantic scope of ex-situ *wh*-phrases to their syntactic scope.<sup>28</sup> In situ *wh*-quantifiers, by contrast, are permitted to take arbitrarily wide scope.

In Slavic languages such as Russian or Serbo-Croatian (Penn 1999), there does not appear to be a constraint on the number of simultaneously fronted *wh*phrases, as illustrated by the examples in (69) taken from Penn (1999: 163).

	- b. \* Ko who si CL.2SG koga whom mislio thought da COMP je CL.3SG voleo? loved

Given that HPSG's nonlocal features, and in particular SLASH and qUE/WH, are set valued, multiple *wh*-fronting is a rather expected property. In fact, the gram-

<sup>27</sup>See also Arnold & Godard (2021: Section 2.1.1), Chapter 14 of this volume on pied piping.

<sup>28</sup>Kathol (1999) uses the qUE feature in his analysis of partial *wh*-fronting in German.

#### Robert D. Borsley & Berthold Crysmann

mar of English interrogatives as proposed by Ginzburg & Sag (2000) specifically stipulates that there be only a singleton WH set, and that head-filler structures cannot be recursive.

The point where Slavic multiple fronting poses a challenge is its interaction with second position clitics: it seems, as witnessed by the contrast in (69), that multiple fronted *wh*-phrases are treated as a constituent, as far as linearisation is concerned. Penn (1999) proposes a topological analysis based on extended word order domains (Reape 1990; 1994; Kathol 2000; Müller 2021b: Section 6, Chapter 10 of this volume) in order to reconcile multiple fronted constituents with the second position property: in essence, multiple fillers are assigned to the same initial topological field and linearisation of clitics proceeds relative to that same initial field.

# **7.3** *Wh* **in situ**

In the previous subsections, as in most of this chapter, we have capitalised on ex situ *wh*-constructions. However, even in languages like English, and even more in French, we do find constructions with clear interrogative semantics where nonetheless the *wh*-phrase stays in situ. Moreover, in languages such as Japanese or Coptic Egyptian, in situ realisation is the norm, rather than the exception. In this subsection we shall therefore discuss how HPSG's theory of unbounded dependencies has been put to use to account for this phenomenon.

In languages such as English, where standard *wh*-interrogatives are signalled by a *wh*-phrase ex situ (i.e. by a *wh*-filler), Ginzburg & Sag (2000: Chapter 7) identify two types of in situ *wh*-questions in English: so called reprise (or "echo") questions, which typically mimic the syntax and semantics of the speech act they are modelled on (e.g. an assertion, an order etc.), and direct in situ interrogatives, the latter being more strongly restricted pragmatically.

However, *wh* in situ may even be an unmarked, or even the default option for the expression of *wh*-interrogatives: Johnson & Lappin (1997: Section 6.2), studying Iraqi Arabic, made the important observation that *wh*-fronting is optional in this language, posing a challenge for transformational models at the time. In Iraqi Arabic, a *wh*-interrogative may be realised ex situ, as in (70a) or in situ, as in (70b).

(70) a. Mona Mona shaafat saw meno?<sup>29</sup> whom 'Who did Mona see?'

(Iraqi Arabic)

<sup>29</sup>Johnson & Lappin (1997: 318)

b. Meno who shaafat saw Mona?<sup>30</sup> Mona 'Who did Mona see?'

They propose a straightforward analysis within HPSG, suggesting to drop what can be regarded as a parochial constraint of English and related languages, and allow qUE feature percolation from the right clausal daughter.

What is more, they note that *wh* in situ and ex situ strategies do observe different locality restrictions, thereby lending further support to a difference in the type of nonlocal feature involved. While feature percolation for in situ *wh*-constructions cannot escape finite clauses (cf. the contrast in (71a,b), ex situ *wh*-interrogatives, involving a SLASH dependency, are obviously not subject to this restriction, as witnessed by (71c).<sup>31</sup>

	- b. \* Mona Mona tsawwarat thought Ali Ali ishtara bought sheno? what
	- c. Sheno what tsawwarit thought Mona Mona Ali Ali ishtara? bought 'What did Mona think Ali bought?'

Yet, even this constraint, while valid for Iraqi Arabic, must be considered language-specific: Crysmann & Reintges (2014) study Coptic Egyptian, where *wh* in situ is the norm. They observe that the scope of an in situ *wh*-phrase is determined by the position of a relative complementiser and note that it can easily escape finite clauses, as shown in (72).

(72) ere REL əm=mɛɛʃe DEF.PL=crowd tʃoː say əmmɔ=s PREP=3F.SG [tʃe that ang I nim]?<sup>32</sup> who (Coptic Egyptian) 'Who do the crowds say that I am?' (Luke 9,18)

Their analysis builds on Johnson & Lappin (1997), yet suggests that qUE percolation in this language may be as unrestricted as SLASH percolation.

<sup>30</sup>Johnson & Lappin (1997: 320)

<sup>31</sup>The examples in (71) are from Johnson & Lappin (1997: 318).

<sup>32</sup>Crysmann & Reintges (2014: 72)

Robert D. Borsley & Berthold Crysmann

# **8 Extraposition**

Another non-local dependency is extraposition, the displacement of a constituent towards the right. Extraposition is most often observed with heavy constituents, such as relative clauses or complement clauses, but it has also been attested with lighter constituents such as prepositional phrases and non-finite VPs. In German, where extraposition is particularly common in general (Uszkoreit et al. 1998), extraposed material can be extremely light, including adverbs and NPs (see Müller 1999b: Section 13.1 and Müller 2002: ix–xi for examples).

Apart from the obvious difference in the linear direction of the process, extraposition also contrasts with e.g. filler-gap dependencies with respect to the domain of locality: e.g. island constraints that have been claimed to hold for extraction to the left, such as the Complex NP Constraint (Ross 1967: Section 4.1), clearly do not hold with complement clause nor relative clause extraposition, as the following examples by (Keller 1994: 4, 11) and Müller (G. 1996: 219) show:

(73) a. Planck Planck hat has [die the Entdeckung discovery \_] gemacht, made [daß that Licht light Teilchennatur particle.nature hat.] 33 has (German)

'Planck made the discovery that light has particle properties.'

	- b. \* [die who das the Stück play gelesen read hat] , has habe have ich I [eine a Frau woman \_] getroffen.<sup>36</sup> met

<sup>33</sup>Keller (1994: 4)

<sup>34</sup>Keller (1994: 11)

<sup>35</sup>Müller (G. 1996: 219)

<sup>36</sup>Müller (G. 1996: 219)

Conversely, while extraction to the left can easily cross finite clause boundaries (75), extraposition is said to be clause-bound, i.e. subject to the Right Roof Constraint (Ross 1967: Section 5.1.2).

	- b. \* [Daß that Peter Peter sich SELF auf on das the Fest party \_ gefreut looked.forward hat], has hat has niemanden no.one gewundert, surprised [das which Maria Maria veranstaltet organised hat] 38 has

# **8.1 Extraposition via non-local features**

Given the non-local nature of extraposition, a natural approach to this construction is by means of non-local features. Because extraposition differs from extraction in both direction and locality, Keller (1995) and Müller (1999b: Section 13.2) have proposed a distinct non-local feature EXTRA to capture this rightward-oriented dependency. Similar to lexical SLASH introduction, Keller (1995: 303) assumes two lexical extraposition rules, one for complement extraposition, the other for adjunct extraposition.

(77) Complement Extraposition Lexical Rule:

$$\begin{aligned} \left[ \begin{array}{c} \text{\tiny\text{\tiny\text{\tiny\text{\tiny\text{\tiny\text{\tiny\text{\tiny\text{\tiny\text{\tiny\text{\tiny\text{\tiny\text{\tiny\text{\tiny\text{\text}}}}}}}}}\\ \text{\tiny\text{\tiny\text{\text{\tiny\text{\text}}}}}\\ \text{\text{\tiny\text{\tiny\text{\text}}}}
\end{array} \right] \oplus \left[ \begin{bmatrix} \text{\tiny\text{\tiny\text{\text}}} & \text{\text{\text{\textPi}}} \left[ \begin{array}{c} \text{\tiny\text{\text{\textPi}}} \text{\text{\textPi}} & \text{\text{\textPi}} \text{\text{\textPi}} \vee \text{pre} \\ \text{\text{\textPi}} \text{\text{\textPi}} & \text{\text{\textPi}} \end{array} \right] \right] \right] \right] \right\rightarrow \\\\ \left[ \begin{array}{c} \text{\text{\textPi}} \text{\text{\textPi}} \oplus \box{\text{\text{\textPi}}}\\ \text{\text{\textPi}} \text{\text{\textPi}} \text{\text{\textPi}} \text{\text{\textPi}} \oplus \box{\text{\text{\textPi}}} \end{array} \right] \end{aligned} $$

<sup>37</sup>Wiltschko (1994: 11), Keller (1994: 10)

<sup>38</sup>Wiltschko (1994: 11), Keller (1994: 10)

#### Robert D. Borsley & Berthold Crysmann

(78) Adjunct Extraposition Lexical Rule: " LOC 2 - CAT|HEAD *noun* ∨ *verb* NONLOC|EXTRA 1 # ↦→

$$\begin{bmatrix} \text{Loc} \vert \text{conT} \boxed{\Box} \\\\ \text{NONLOC} \boxed{\Box} \cup \left\{ \left[ \begin{bmatrix} \text{CAT} \begin{bmatrix} \text{HEAD} \begin{bmatrix} \text{prep} \lor \text{rel} \\ \text{MOD} \box& \text{LOC} \boxed{\Box} \end{bmatrix} \\\\ \text{conT} \boxed{\Box} \end{bmatrix} \end{bmatrix} \right] \right\} \end{bmatrix}$$

The complement extraposition rule is straightforward: it removes a valency from the COMPS list and inserts its LOCAL value into the EXTRA set.

As for adjunct extraposition, the lexical rule equally inserts an element into the EXTRA set, yet constrains it to be a modifier that selects for the local value of the lexical head (via MOD).

Since EXTRA is a nonlocal feature, percolation up the tree, i.e. the middle of the dependency, is handled by the Nonlocal Feature Principle (Pollard & Sag 1994: 164).

At the top, the Head-Extra Schema will bind all extraposition dependencies, which are realised as extraposed daughters.<sup>39</sup>

(79) Head-Extra Schema:

 SYNSEM - NONLOC|EXTRA x DTRS " HEAD-DTR - SYNSEM|NONLOC|EXTRA 1, …, n ∪ x NON-HD-DTRS -SYNSEM|LOC 1 , …, - SYNSEM|LOC n # 

 Order of extraposed daughters amongst each other and with respect to the head is regulated by linear precedence statements (see Müller (2021b: Section 2), Chapter 10 of this volume on linear precedence constraints).

Keller (1995) discusses how salient differences between extraction and extraposition can be captured quite straightforwardly: to account, e.g., for the clauseboundedness, it will be sufficient to restrict the EXTRA set of clausal signs to be the empty set. Similarly, since extraposition (EXTRA) and extraction (SLASH) are implemented by different features, locality constraints imposed on SLASH will not hold for extraposition.

<sup>39</sup>We give a slightly simplified version of the schema, ignoring the PERIPHERY feature that was introduced to control for spurious ambiguity that could arise from string-vacuous extraposition. See Keller (1995: 304–305) for details and Crysmann (2005b) for an alternative solution.

# **8.2 Extraposition as word order variation**

An entirely different approach to extraposition has emerged as part of the HPSG work on linearisation using complex order domains. Following Reape (1994), who suggested that linearisation in scrambling languages such as German should operate on larger domains than local trees of depth one, Kathol (1995; 2000) and Kathol & Pollard (1995) have explored its suitability as a model for extraposition in German.

The connection between scrambling and extraposition does have some initial plausibility for freer word order languages such as German, since the maximal domain of extraposition, i.e. the clause, coincides with that of scrambling. However, even for German, extraposition from NPs already necessitates special mechanisms, such as partial compaction, that are specific to extraposition and have no analogous motivation for scrambling, where only union and total compaction are used.<sup>40</sup> Once we approach languages such as English that display a much stricter order, yet still allow extraposition, a scrambling approach to extraposition becomes highly questionable.

# **8.3 Generalised modification**

Another line of proposals capitalises on the differences between complement and adjunct extraposition: as argued by Kiss (2005: 284), the non-locality observed with relative clause extraposition in German, as in (80a) does not translate to complement extraposition in equal measure, cf. (80b).


<sup>40</sup>In linearisation-based HPSG, domain union creates an extended order domain, whereas compaction closes the domain by collapsing the list of domain objects into a single one. See Müller (2021b: Section 6), Chapter 10 of this volume for explanation of linearization-based HPSG in general and Müller (2021b: Section 6.3), Chapter 10 of this volume for a detailed discussion of the specific linearization-based approach to extraposition mentioned above.

<sup>41</sup>Haider (1996: 259)

#### Robert D. Borsley & Berthold Crysmann

b. \* Man one hat has [den the Überbringer messenger [der of.the Mitteilung message \_ ]] beschimpft, insulted [daß that die the Erde earth rund round ist] . 42 is 'The messenger was insulted who delivered the message that the world is a sphere.'

While acceptable examples of complement extraposition from complex NPs can be found (see example (83) below, extraposition from adjuncts yields much sharper contrasts, which have not yet been contested:

	- b. Hier here habe have ich I [bei during [vielen many Versuchen attempts ]] faul lazily auf on der the Wiese lawn gelegen, laid bei during denen which die the Schwerkraft gravity überwunden overcome wurde.<sup>44</sup> was 'I was lying here lazily on the lawn, during many attempts at which gravity was overcome.'

Interestingly enough, complement extraposition (81a) appears to pattern with leftward extraction (82) in this respect, which underlines the extraction-like property of complement extraposition:

(82) \* Das the Verlies dungeons hat has er, he [als when er he \_ verließ], left gelacht.<sup>45</sup> laughed Intended: 'He laughed when he left the dungeons.'

Furthermore, Kiss observes that relative clause extraposition may give rise to split antecedents, and therefore concludes that this process should be better understood as an anaphoric one, rather than as extraction to the right.

<sup>42</sup>Kiss (2005: 282)

<sup>43</sup>Kiss (2005: 283)

<sup>44</sup>Kiss (2005: 285)

<sup>45</sup>Haider (1996: 261)

Similar in spirit to Culicover & Rochemont (1990), Kiss (2005)suggests that relative clause extraposition can target any referential index introduced within the clause the relative clause attaches to. To that end, he proposes a set valued AN-CHOR feature that indiscriminately percolates up the tree the index (and handle) of any nominal expression. In situ and extraposed relative clauses then semantically bind one of the INDEX/HANDLE pairs contained in the ANCHOR set of the head they syntactically adjoin to.46*,*<sup>47</sup>

Figure 12: Anchor percolation in relative clause extraposition (Kiss 2005)

The claim about the locality of complement extraposition has not been left unchallenged: Müller (1999b: 206; 2004a: 10) presents examples of complement clause extraposition that equally defy the Complex NP Constraint.

(83) Ich I habe have [von of [dem the Versuch attempt [eines of.a Beweises proof [der of.the Vermutung hypothesis \_ ]]]]

<sup>46</sup>See Koenig & Richter (2021: Section 6.1), Chapter 22 of this volume for an overview of Minimal Recursion Semantics, the meaning description language assumed in Kiss' approach.

<sup>47</sup>Crysmann (2005b) proposes to synthesise the approach by Kiss (2005) with that of Keller (1995), using a two-step percolation mechanism that effectively controls for spurious ambiguity.

gehört, heard [daß that es there Zahlen numbers gibt, exist die which die the folgenden following Bedingungen conditions erfüllen] . 48 fulfil 'I have heard of the attempt at a proof of the hypothesis that there are numbers which fulfil the following conditions.'

Consequently, he suggests that complement extraposition and adjunct extraposition should both be handled by the same mechanism, i.e. a non-local EXTRA feature (Keller 1995; Müller 1999b: Section 13.2).

Crysmann (2013) challenges Müller's unified analysis on the grounds that it severely overgenerates. While he concedes that non-local complement extraposition is indeed possible, he argues that the two processes still need to be distinguished, because (i) only adjunct extraposition may target split antecedents and (ii) complements cannot extrapose out of adjuncts, whereas adjunct extraposition observes no such constraint. He further notes that non-local complement extraposition is subject to stronger bridging requirements than adjunct extraposition, both semantic and prosodic: as illustrated in (84), acceptability greatly improves with the semantic affinity between the complex NP from which extraposition proceeds and the verb that governs it.

(84) a. Er he hat has [ein a Buch book [über about die the Theorie theory \_ ]] gelesen, read [daß that Licht light Teilchennatur particle nature hat] . 49 has 'He has read a book about the theory that light has particle properties.'

b. \* Er he hat has [ein a Buch book [über about die the Theorie theory \_ ]] geklaut, stolen [daß that Licht light Teilchennatur particle nature hat] . 50 has

'He has stolen a book about the theory that light has particle properties.'

<sup>48</sup>St. Müller (2004b: 223)

<sup>49</sup>Crysmann (2013: 381)

<sup>50</sup>Crysmann (2013: 381)

	- b. \* [Über about Syntax] syntax hat has Max Max [ein a Buch book \_ ] geklaut.<sup>52</sup> stolen 'It's about syntax that Max has stolen a book.'

While this effect for complement extraposition is similar to what has been observed for PP extraction out of NPs (De Kuthy 2002a), cf. the examples in (85), it is of note that no such contrasts can be found for adjunct extraposition:

	- b. Er he hat has [ein a Buch book(N) [über about die the Theorie theory(F) \_ ]] geklaut, stolen [die which.F derzeit currently kontrovers controversially diskutiert discussed wird] . 54 is 'He has stolen a book about the theory which is under considerable debate at present.'

Crysmann (2013) unifies the anaphoric approach of Kiss (2005) for adjunct extraposition with the rightward-extraction approach of Keller (1995) and Müller (1999b), and suggests that both processes should be modelled by the same set-valued non-local feature (EXTRA), but that elements on that set should be distinguished as to whether they are mainly anaphoric elements (*weak-local*), or full-fledged *local* values (*full-local*), cf. Section 6. Under this perspective, extraposed adjuncts are expected to escape extraction islands (such as adjunct islands), as well as to modify split antecedents, simply because they involve a grammaticalised anaphoric process, not extraction. Conversely, complement extraposition involves an extraction-like dependency, making it more prone to island constraints, which may be bridged (complex NPs) or not (adjunct islands).

<sup>51</sup>De Kuthy (2002b: 148)

<sup>52</sup>De Kuthy (2002b: 148)

<sup>53</sup>Crysmann (2013: 381)

<sup>54</sup>Crysmann (2013: 381)

Robert D. Borsley & Berthold Crysmann

# **9 Filler-gap mismatches**

As noted in the introduction, there are unbounded dependency constructions in which a filler apparently does not match the associated gap. In this section we will look briefly at two examples of such mismatches.

An interesting type of example is what Arnold & Borsley (2010) call auxiliarystranding relative clauses (ASRCs). The following illustrate:

	- b. Kim has sung, which Lee hasn't \_.
	- c. Kim is singing, which Lee isn't \_.
	- d. Kim is clever, which Lee isn't \_.
	- e. Kim is in Spain, which Lee isn't \_.
	- f. Kim wants to go home, which Lee doesn't want to \_.

*Which* in these examples appears to be the ordinary nominal *which*, but the gap is a VP in (87a), (87b), (87c) and (87f), an AP in (87d), and a PP in (87e). One response to these data might be to propose that *which* in such examples is not the normal nominal *which*, but a pronominal counterpart of the categories which appear as complements of an auxiliary, mainly various kinds of VP. It is clear, however, that ordinary VP complements of an auxiliary cannot appear as fillers in a relative clause, as shown by the (b) examples in the following:

	- b. \* This is the book, [read which] Kim will \_.
	- b. \* This is the book, [reading which] Kim is \_.

Thus, this does not seem a viable approach.

Arnold & Borsley (2010) propose that these examples involve a special kind of gap. As noted above, in a normal gap, the LOCAL value and the SLASH value match. However, as Webelhuth (2008) noted, there is no reason why we should not under some circumstances have what he calls a "dishonest gap", one whose LOCAL value does not match its SLASH element. Developing this approach, Arnold & Borsley (2010) propose that when an auxiliary has an unrealised complement,

the complement optionally has a certain kind of nominal in SLASH, which is realised as relative *which*. When SLASH has the empty set as its value, the result is an auxiliary complement ellipsis sentence. When SLASH contains a nominal element, we have a dishonest gap, because the value of LOCAL is whatever the auxiliary requires, normally a VP of some kind, and the result is an auxiliarystranding relative clause.

A rather different type of example, discussed, among others, by Bresnan (2001: Chapter 2), Bouma et al. (2001: 25–26), and Webelhuth (2012), is the following:

(91) That he might be wrong, he didn't think of \_.

Here, the apparent filler is a clause, but as the following shows, only an overt NP and not an overt clause is possible in the position of the gap.

	- b. \* He didn't think of that he might be wrong.

The most detailed HPSG discussion of such examples is Webelhuth (2012). Webelhuth argues on the basis of examples like the following that initial clauses cannot be associated with a clausal gap:

	- b. \* [That John is guilty] it seems.

Thus, initial clauses can only be associated with a nominal gap. Bouma et al. (2001: 25–26) propose an analysis in which an NP gap has an S in its SLASH value. In other words, they propose a dishonest gap. Webelhuth (2012) argues against this approach and proposes an analysis in which an S[SLASH {NP}] in which the NP has a clausal interpretation can combine with a finite clause. Thus, Figure 13 gives the schematic structure for (91).

On this analysis, the initial clause is not a filler, and the construction is not a head-filler phrase. However, the analysis involves a normal unbounded dependency except at the top. In contrast, the Arnold and Borsley analysis of ASRCs outlined earlier involves a normal unbounded dependency except at the bottom.

Figure 13: "Dishonest" gap

# **10 Concluding remarks**

The preceding pages have, among other things, highlighted the fact that there are some unresolved issues in the HPSG approach to unbounded dependencies. In particular, there is disagreement about whether or not gaps are empty categories and about whether or not the middle of a dependency is head-driven. It is important, therefore, to emphasise that a number of matters seem reasonably clear. In particular, it is generally accepted that unbounded dependencies involve a set- or list-valued feature called SLASH or, in some recent work, GAP. It is also generally accepted that this is true of all types of unbounded dependencies, including those with a filler and those without, those with a gap and those with a resumptive pronoun, as well as dependencies with or without some kind of mismatch between filler and gap. Finally, it is generally accepted that the hierarchies of phrase types that are a central feature of HPSG provide an appropriate way to capture both the similarities among the many unbounded dependency constructions and the variety of ways in which they differ. The general approach seems to compare quite favourably with the approaches that have been developed within other frameworks.

# **Appendix: Unbounded dependencies in Sign-Based Construction Grammar**

This chapter has concentrated on the approach to unbounded dependencies that has been developed with Constructional HPSG. As has been discussed in a number of chapters,<sup>55</sup> a version of HPSG called Sign-Based Construction Grammar

<sup>55</sup>See Abeillé & Borsley (2021: Section 7.2) and Müller (2021c: Section 1.3.2) for a general comparison of Constructional HPSG and SBCG. Flickinger et al. (2021) discuss the evolution of HPSG and the pages 68–70 deal with the HPSG variant SBCG.

(SBCG) was developed in the 2000s, which differs from Constructional HPSG in a number of ways (Sag 2012). Among other things, it has a somewhat different treatment of unbounded dependencies. In this appendix, we outline the main ways in which SBCG is different in this area.

Unlike Constructional HPSG, SBCG makes a fundamental distinction between signs and constructions. Constructions are objects which associate a mother sign (MTR) with a list of daughter signs (DTRS), one of which may be a head daughter (HD-DTR). Headed constructions thus take the following form:


Constructions are utilised by the Sign Principle, which can be formulated as follows:

(97) Signs are well formed if either


Constructions and the Sign Principle are features of SBCG which are lacking in Constructional HPSG. Hence, they are complications. But they allow simplifications. In particular, they allow a simpler notion of sign without the features DTRS and HD-DTR. This in turn allows the framework to dispense with *synsem* and *local* objects. The ARG-ST feature and the VALENCE feature, which replaces SUBJ and COMPS, take lists of signs and not *synsem* objects as their value. More importantly in the present context, the GAP feature, which replaces SLASH, takes as its value a list of signs and not *local* objects.

One might suppose that this view of GAP would entail that a filler and the associated gap have all the same syntactic and semantic properties, unlike within Constructional HPSG, where they only share the syntactic and semantic properties that are part of a *local* object and hence not the WH feature in *wh*-interrogatives. However, the framework allows constraints to stipulate that certain objects are the same except for some specified features. The constraint of the filler-head construction, which corresponds to HPSG's head-filler phrase, stipulates that the sign that is the filler is identical to the sign in the GAP list of its sister, except for the value of the WH feature and the REL feature used in relative clauses (Sag 2012: 166). Thus, filler and gap differ in the same way in SBCG and Constructional HPSG, but for different reasons.

#### Robert D. Borsley & Berthold Crysmann

At the bottom of the dependency, things are rather different. The SBCG analysis allows a member of the ARG-ST list of a lexical head to appear not as a member of the word's VALENCE list, but as a member of its GAP list. We can illustrate with *read* in the following examples:

(98) a. I will read the book.

b. Which book will you read?

In (98a), *read* has the values in (99) for the three features:

 

$$\text{(99)} \quad \begin{cases} \text{ARG-ST} & \langle \Box \text{NP}, \Box \text{NP} \rangle \\ \text{VALENCE} & \langle \Box \Box \rangle \\ \text{GAP} & \langle \rangle \end{cases}$$

Here, ARG-ST and VALENCE have the same value, and the value of GAP is the empty list. In (98b), the three features have the following values:

$$\begin{array}{c} \text{(100)} \quad \begin{bmatrix} \text{ARG-ST} & \left\langle \begin{smallmatrix} \Box \text{NP}, \boxed{\Box} \text{NP} \right\rangle \end{smallmatrix} \\ \text{VALENCE} & \left\langle \begin{smallmatrix} \Box \end{smallmatrix} \right\rangle \\ \text{GAP} & \left\langle \begin{smallmatrix} \Box \end{smallmatrix} \right\rangle \end{bmatrix} \end{array}$$

The second member of the ARG-ST list appears not in the VALENCE, but in the GAP list. This is rather different from HPSG. As discussed in Section 2, HPSG gaps have a non-empty SLASH value. Here, gaps are just ordinary signs which appear in a GAP list and not in a VALENCE list.

This is an interesting alternative to the approach outlined in the main body of this chapter. However, it would need to be extended to account for some of the phenomena considered here.

# **Abbreviations**

PRT particle RESUMP resumptive element

# **Acknowledgements**

We would like to thank a reviewer and the editors of the handbook, in particular Jean-Pierre Koenig and Stefan Müller for their comments and corrections on various versions of this chapter.

# **References**


*Conference on Head-Driven Phrase Structure Grammar, National Institute of Information and Communications Technology, Keihanna*, 325–345. Stanford, CA: CSLI Publications. http : / / cslipublications . stanford . edu / HPSG / 9 / arnold borsley.pdf (5 June, 2019).



#### Robert D. Borsley & Berthold Crysmann

*Conference on the Formal Syntax of Natural Language, June 9 - 11, 1976*, 71–132. New York, NY: Academic Press.


*mar, University at Buffalo*, 63–82. Stanford, CA: CSLI Publications. http : / / cslipublications.stanford.edu/HPSG/2014/crysmann-reintges.pdf (10 February, 2021).







# **Chapter 14**

# **Relative clauses in HPSG**

# Doug Arnold

University of Essex

# Danièle Godard

Université de Paris, Centre national de la recherche scientifique (CNRS)

We provide an extended discussion of analyses of relative clauses (prototypically clauses with a noun modifying function) and related constructions that have appeared in the HPSG literature. The basic theoretical approaches are presented (specifically, the lexical "head-driven" approach associated with earlier work in HPSG and the more recent constructional approach), followed by descriptions of analyses of different kinds of relative clause across a range of typologically diverse languages (notably Arabic, English, French, German, Japanese, and Korean). Phenomena discussed include *wh*-relatives, relatives headed by complementisers, "bare" relatives, non-restrictive relatives, extraposition of relative clauses, relative clause-like constructions that function as complements, various kinds of "dependent noun" and "pseudo" relative clause, and free (headless) relatives.

# **1 Introduction**

The goal of this paper is to give an overview of HPSG analyses of relative clauses. Relative clauses are, typically, sentential constructions that function as nominal modifiers, like the italicised part of (1), for example.

(1) The person *to whom Kim spoke yesterday* claimed to know nothing.

Relative clauses have been an important topic in HPSG: not only as the focus on a considerable amount of descriptive and theoretical work across a range of languages, but also in terms of the theoretical development of the framework.

Doug Arnold & Danièle Godard. 2021. Relative clauses in HPSG. in Stefan Müller, Anne Abeillé, Robert D. Borsley & Jean- Pierre Koenig (eds.), *Head-Driven Phrase Structure Grammar: The handbook*, 595–663. Berlin: Language Science Press. DOI: 10.5281/zenodo.5599844

#### Doug Arnold & Danièle Godard

Notably, Sag's (1997) analysis of English relative clauses was the first fully developed realisation of the constructional approach involving cross-classifying phrase types that has dominated work in HPSG in the last two decades, and was thus the first step towards the development of Sign-Based Construction Grammar (SBCG; cf. Müller 2021d: Section 1.3.2, Chapter 32 of this volume on SBCG and Flickinger, Pollard & Wasow 2021, Chapter 2 of this volume on the evolution of HPSG).

The basic organisation of the discussion is as follows: Section 2 introduces basic ideas and overviews the main analytic techniques that have been used, focusing on one kind of relative clause. Section 3 looks at other kinds of relative clause in a variety of languages. Section 4 looks at a variety of constructions which have some similarity with relative clauses, but which are in some way untypical (e.g. clauses that resemble relative clauses, but which are not nominal modifiers, or which are not adjoined to the nominals they modify). Section 5 provides a conclusion.

# **2 Basic ideas and approaches**

This section introduces basic ideas and intuitions about relative clauses, viewed from an HPSG perspective (Section 2.1), then introduces the two main approaches that have been taken in HPSG: the lexical approach of Pollard & Sag (1994) which makes use of phonologically empty elements (Section 2.2), and the constructional approach of Sag (1997), which makes phonologically empty elements unnecessary (Section 2.3). Section 2.4 presents some interim conclusions, and provides some brief discussion of alternative approaches.

# **2.1 Basic ideas and intuitions**

Relative clauses are, prototypically, sentential constructions which modify a nominal. (2) is an example of one kind of English relative clause, which we will call a "*wh*-relative". In (3) it is used as a modifier of the nominal *person* (the *antecedent* of the relative clause).


Syntactically, this kind of relative clause consists of a preposed *wh*-phrase (*to whom*), i.e. a phrase containing a relative pronoun (*whom*), and a clause with a missing constituent — a gap (the complement of *speak*: *Kim spoke* \_*yesterday*).

#### 14 Relative clauses in HPSG

This is often called the *relativised constituent*. Semantically, in (3) the interpretation of the relative clause is *intersective*: (3) denotes the intersection of the set of people and the set of entities that Kim spoke to. Getting this interpretation involves combining the descriptive content of the antecedent nominal and the propositional content of the relative clause, and equating the referential indices of the nominal and the relative pronoun, to produce something along the lines of "the set of where is a person and Kim spoke to ".

Not all relative clauses have these properties, but they provide a good starting point. In the remainder of this section, we will show, in broad terms, how these properties can be accounted for.

As regards their function and distribution, relative clauses are subordinate clauses, which can be captured by assuming they have a HEAD feature like [MC –], "MAIN-CLAUSE *minus*". They are naturally assumed to be adjuncts: their distribution as nominal adjuncts can be dealt with by assuming that (like other adjuncts) they indicate the sort of head they can modify via a feature like MOD or SELECT. That is, relative clauses such as (2) will be specified as in (4a), whereas adjunct clauses headed by a subordinator like *because* (as in *We're late because it's raining*) will be specified as (4b), and normal, non-adjunct, clauses will typically be specified as (4c):

	- b. - SYNSEM|LOC|CAT|HEAD|MOD - LOC|CAT|HEAD *verb*
	- c. - SYNSEM|LOC|CAT|HEAD|MOD *none*

With this in hand, we will look in more detail at the internal structure of this kind of relative clause (Section 2.1.1), and at the relation between the relative clause and its antecedent (Section 2.1.2).

### **2.1.1 The internal structure of the relative clause**

As regards internal structure, it is characteristic of *wh*-relatives that they consist of a preposed *wh*-phrase and a clause containing a gap. The dependency between the *wh*-phrase and the associated gap is potentially unbounded, as can be seen from examples like (5).

(5) the person to whom [Sam said [Kim intended [to speak \_yesterday]]]

As regards the *wh*-phrase, it is notable that it must be preposed — English does not allow examples like (6a) without a relative phrase, or (6b) where the relative phrase is *in situ*.

#### Doug Arnold & Danièle Godard

	- b. \* a person Kim spoke to whom yesterday

Despite being forbidden *in situ*, the preposed*wh*-phrase behaves in some respects as though it occupied the gap. For example, in the examples above *to whom* satisfies the subcategorisation requirements of *speak*, and makes a semantic contribution in the gapped clause. Assuming some kind of co-indexation relation between the antecedent and the *wh*-phrase, the same behaviour can be seen with subject-verb agreement, as in (7a), and binding, as in (7b):

	- b. a person who [everyone thinks [ \_hates herself/\*her]]

In fact, this dependency between the *wh*-phrase and the gap appears to be a typical filler-gap dependency, with the *wh*-phrase as the filler, which can be handled by standard SLASH inheritance techniques (see Borsley & Crysmann 2021, Chapter 13 of this volume), so that these properties are accounted for.

In examples like (2) the fronted phrase must contain a relative pronoun. Here we have another apparently nonlocal dependency, because the relative pronoun can be embedded arbitrarily deeply inside the *wh*-phrase (example (8d) is due to Ross 1967: 10):

	- b. the person [to [[whose children's] friends]] Kim spoke \_
	- c. the person [to [the children [of [whose friends]]] Kim spoke \_
	- d. reports [the height [of [the lettering [on [the covers [of which]]]]] the government prescribes \_

This dependency between a relative pronoun and the phrase that contains it is often called "*wh*-percolation", "relative percolation", or, following Ross (1967), "pied-piping". We will talk about *relative inheritance*.

Notice that as well as being unbounded, relative inheritance resembles SLASH inheritance in that the "bottom" of the inheritance path (i.e. the actual relative pronoun, or the gap in a filler-gap dependency) is typically not a head (e.g. *whom* is not the head of *to whom*). Moreover, though examples involving multiple independent relative pronouns are rather rare in English (i.e. there are few, if any, relative clauses parallel to interrogatives like *Who gave what to whom?*) they exist in other languages, so it is reasonable to assume that relative inheritance

#### 14 Relative clauses in HPSG

involves a set of some kind.<sup>1</sup> This motivates the introduction of a REL feature which is subject to the same kind of formal mechanisms as SLASH. 2

The idea is that a relative pronoun will register its presence by introducing a non-empty REL value, which will be inherited upwards until it reaches the top node of the *wh*-phrase (equivalently: a relative clause introduces a non-empty REL value on its *wh*-phrase daughter that is inherited downwards till it is realised as a relative pronoun).<sup>3</sup> Within the *wh*-phrase, REL inheritance can be handled by the same sort of formal apparatus as is used for handling SLASH inheritance. Blocking REL inheritance from carrying a REL element upwards beyond the top of a relative clause can be achieved with the same formal apparatus as is used to block SLASH inheritance from carrying information about a gap higher than the level at which the associated filler appears.<sup>4</sup>

Co-indexation of the antecedent nominal and the relative pronoun can be achieved simply if the REL value contains an index which is shared by both the antecedent and the relative pronoun. As regards the relative pronoun, at the "bottom" of the REL dependency, this can be a matter of lexical stipulation: rela-

<sup>1</sup>Examples of languages which allow multiple relative pronouns include Hindi (e.g. Srivastav 1991) and Marathi (e.g. Dhongde & Wali 2009: Chapter 7). See Pollard & Sag (1994: 227–232) for HPSG analyses. In English, multiple relative pronouns occur in cases of co-ordination (e.g. *the person with whom or for whom you work*), but they are not independent (they relate to the same entity). Kayne (2017) gives some English examples that appear to involve multiple relative pronouns, but they are rather marginal.

<sup>2</sup>The assumption that relative inheritance should be treated as involving an unbounded dependency (i.e. handled with a NONLOCAL feature, like SLASH), has been challenged in Van Eynde (2004) (Van Eynde argues it should be treated as a local dependency).

<sup>3</sup>Note that the relative word has its normal syntactic function as a determiner or a full NP. This is different from most approaches in Categorial Grammar, which assume that the relative word is the functor taking a clause with a gap as argument (Steedman 1996: 49). As Pollard (1988) pointed out pied-piping data like the one discussed in (8) are problematic for Categorial Grammar. These problems were addressed in later Categorial Grammar work but the solutions involve additional modes of combination. See Müller (2016: Chapter 8.6) for discussion and Kubota (2021), Chapter 29 of this volume for a general comparison of Categorial Grammar and HPSG. Kubota addresses pied-piping on p.1360–1361.

<sup>4</sup> In case it is not obvious why further upward inheritance of a REL value would be problematic, notice that while a relative clause can *contain* a *wh*-phrase, it cannot *be* a *wh*-phrase, e.g. it cannot function as the filler in a relative clause. Suppose, counter-factually, the REL value of *who* could be inherited beyond the relative clause *to whom Kim spoke*, so that e.g. *a person to whom Kim spoke* was marked as [REL { 1 }]. This phrase would be able to function as the *wh*phrase in a relative clause like \*[*a person to whom Kim spoke*] *Sam recognised* \_, which would be able to combine with a noun specified as [INDEX 1 ] to produce something like \**a person* [[*a person to whom Kim spoke*] *Sam recognised* \_].

#### Doug Arnold & Danièle Godard

tive pronouns can be lexically specified as having a REL value that contains their INDEX value, roughly as in (9a), which we abbreviate to (9b).<sup>5</sup>

(9) a. Lexical item for a relative pronoun: SYNSEM LOC " CAT - HEAD *noun* CONT - INDEX 1 # NONLOC - INHER|REL 1 b. N[REL { <sup>1</sup> }] <sup>1</sup>

This index can then be inherited upwards via the REL value to the level of the *wh*-phrase. At the top, the index of the antecedent can be accessed via the MOD value of the relative clause: this is simply a matter of replacing the specification of the MOD value in (4a) with that in (10a), abbreviated as in (10b), where 1 is the index that appears in the REL value of the associated *wh*-phrase.<sup>6</sup>

$$\begin{array}{ll} \text{(10)} & \text{a. } \left[ \begin{subarray}{c} \text{SYNSEM}[\text{LOC}[\text{CAT}[\text{HEAD}[\text{MOD}] \begin{subarray}{c} \text{LOC} \\ \text{CONT} \begin{bmatrix} \text{INDEX}[\text{\Box} \end{bmatrix} \end{subarray}] \end{array} \end{array} \right] \\ \begin{array}{ll} \text{(10)} & \text{a. } \left[ \begin{array}{c} \text{CAT} \\ \text{CONT} \begin{bmatrix} \text{INDEX}[\text{\Box} \end{bmatrix} \end{array} \right] \\ \begin{array}{c} \text{b. } \text{S} \begin{bmatrix} \text{MOD} \\ \text{N} \begin{bmatrix} \text{D} \end{bmatrix} \end{bmatrix} \end{array} \end{array} \end{array}$$

Schematically, then, *wh*-relatives should have structures along the lines of Figure 1. The top structure here is a head-filler structure. Notice how SLASH inheritance ensures the relevant properties of the PP are shared by lower nodes so that the subcategorisation requirements of the verb can be satisfied, with the PP being interpreted as a complement of the verb (equivalently: SLASH inheritance ensures that the gap caused by the missing complement of *speak* is registered on higher nodes until it is filled by the PP). Similarly, REL inheritance means that the INDEX of the relative pronoun appears on higher nodes so that it can be identified with the INDEX of the antecedent noun, via the MOD value of the highest S (equivalently: the index of the antecedent nominal appears on lower nodes down to the relative pronoun, so that the nominal and the relative pronoun are co-indexed).

<sup>5</sup>Here, and below, we will abbreviate attribute paths where no confusion arises, and use a number of other standard abbreviations, in particular, we write INDEX values as subscripts on nouns and NPs. We use N to indicate a noun with an empty COMPS list, i.e. one which has combined with its complements, if any, and NP for a N with an empty SPR (SPECIFIER) list (e.g. a combination of determiner and a N). Similarly, we use PP to abbreviate a phrase consisting of a preposition and its complement, VP for a verb with all its arguments except the subject and S for a verb with all its arguments.

<sup>6</sup>We assume, for simplicity, that the value of REL is a set of indices. This is consistent with e.g. Pollard & Sag (1994: 211) and Sag (1997: 451), but not with Ginzburg & Sag (2000: 188), who assume it is a set of *parameters*, that is, indices with restrictions (a kind of *scope-object*), like the qUE and WH attributes, which are alternative names for the feature that is used for *wh*-inheritance in interrogatives. It is not clear that anything important hangs on this.

14 Relative clauses in HPSG

Figure 1: Representation of *to whom Kim spoke*

As regards CONTENT, the effect of this will be to give the relative clause *to whom<sup>i</sup> Kim spoke* an interpretation along the lines of *Kim spoke to whom<sup>i</sup>* , where is the index of its antecedent. In terms of standard HPSG semantics, this "internal" content (i.e. the content associated with a verbal head with its complements and modifiers) is a *state-of-affairs* (*soa*), and can be represented as in (11a), abbreviated to (11b):<sup>7</sup>

$$\begin{array}{ll} \text{(11)} & \text{a.} \begin{bmatrix} soa\\ \text{NUC} \begin{bmatrix} \text{speak\\_to} \\ \text{SpeAKER} & \text{Kim} \end{bmatrix} \\ \text{a.} & \text{append} \begin{bmatrix} \text{L} \end{bmatrix} \end{bmatrix} \end{array}$$

$$\begin{array}{ll} \text{b.} & \text{speak\\_to} (\text{Kim}, \boxed{\text{L}}) \end{array}$$

There are restrictions on what can occur as the preposed *wh*-phrase in a relative clause. However, the matter is not straightforward. There is considerable cross-linguistic variation (cf. for example, Webelhuth 1992: Section 4.3), but even in English the data are problematic. To begin with, examples like (12a) and (12b) suggest that NPs and PPs are fine in English (see also (8) above). Examples like (12c) suggest that Ss are not allowed in English. This much is relatively uncontroversial. However, it is a considerable simplification.

	- b. the person [PP to whom] we think Kim spoke \_
	- c. \* the person [<sup>S</sup> Kim spoke to whom] we think \_

<sup>7</sup> In fact (11a) is already somewhat abbreviated: [SPEAKER *Kim*] is an abbreviation for a structure including an index, and a BACKGROUND restriction on that index indicating that it stands in the *naming* relation to the name *Kim* (Pollard & Sag 1994: 27).

#### Doug Arnold & Danièle Godard

The status of preposed APs is controversial. At first blush, the strangeness of examples like (13a), as opposed to (13b), suggests they are disallowed.

	- b. a person [PP of whom] Kim seems fond \_

However, Nanni & Stillings (1978: 311) give examples (14a) and (14b) and argue that *compared*, and *seated* can be analysed as adjectives, Webelhuth (1992: 129) gives (14c), which uncontroversially involves an AP, and attested examples like (14d) and (14e) can be found, though they are far from common.<sup>8</sup>

	- b. The tree, [AP seated next to which] they found themselves \_, had been planted on the highest point in the park.
	- c. This is the kind of woman [AP proud of whom] I could never be \_.
	- d. a being [AP greater than which] nothing can be conceived \_
	- e. the principles of international law [contrary to which] Turkey is alleged to have acted

Examples involving adverb phrases are rarely discussed, but they can also be found, though again, they are not common:<sup>9</sup>

(15) Light, [AdvP faster than which] nothing can travel \_, takes 412 years to get from here to the nearest star.


<sup>8</sup>Examples like (14d) appear often in discussions of theology, especially St. Anselm's "Ontological Argument" for the existence of God. (14e) is from a legal judgement at: http://www. worldcourts.com/pcij/eng/decisions/1927.09.07\_lotus.htm, accessed 2021-02-04.

<sup>9</sup> (15) is from *The Guardian* "Notes and Queries" section, 4 July, 2007. Huddleston & Pullum (2002: 1053) give examples of (what they call) "relatives" involving what might be analysed as adverbs *when*, *why*, and *where* in expressions like the following (*where* might also be analysed as prepositional):

These are not typical *wh*-relatives: since these *wh*-words are adjuncts, there is no obvious gap in the clause that accompanies the *wh*-word; moreover clauses like those in (i)–(iii) cannot be associated with just any nominal. For example, Kim may have spoken to Sam because of an insult, but ??*the insult why Kim spoke to Sam* is distinctly odd. These clauses are more plausibly analysed as complements of nouns like *time*, *reason*, and *place*.

#### 14 Relative clauses in HPSG

This makes for a rather confusing and contradictory picture. For example, why should (13a) be bad, when (14b) with a very similar AP is acceptable? One possible account might be that the problem with (13a) is not the preposed AP, but the imbalance between the relatively long preposed AP and the rest of the relative clause, which consists of just two words — when the rest of the clause is longer, as in (14b), the result is acceptable.

For VP, the situation is similarly complicated. Examples like the following suggest VPs are not allowed in English (cf. (16d) with a preposed PP):

	- b. \* the person [VP to speak to whom] we expect Kim \_
	- c. \* the person [VP speak to whom] we expect Kim to \_
	- d. the person [PP to whom] we expect Kim to speak \_

However, while finite VPs as in (16a) seem genuinely impossible, non-finite VPs are possible in some circumstances: Nanni & Stillings (1978: 311) give example (17a), and Ishihara (1984: 399) gives example (17b), both of which seem fully acceptable.<sup>10</sup>

	- b. John went to buy wax for the car, [VP washing which], Mary discovered some scratches of paint.

Thus, while important, the restrictions on preposed phrases in *wh*-relatives are poorly understood, and we will have nothing further to say about them here, except to make two points.

First, leaving aside the empirical difficulties, there are in principle two ways one might approach this issue. One would be to directly impose restrictions on the preposed phrase, as in Sag (1997: 455) (Sag requires the preposed phrase to be headed by either a noun or a preposition — which the forgoing suggests is overrestrictive). Another would be to treat the phenomenon as involving restrictions on the way the REL feature is inherited (i.e. relative inheritance, pied-piping in relative clauses) — e.g. as indicating that while REL-inheritance from e.g. NP to PP (and through an upward chain of NPs, PPs, and some kinds of AP and VP), is permitted, it is blocked by an S node, some kinds of VP (and perhaps other

<sup>10</sup>Notice also that an analogue of (16b) is grammatical in German. See De Kuthy (1999), Hinrichs & Nakazawa (1999) and Müller (1999b: Section 10.7) for discussion and HPSG analyses of the phenomena in German. Some discussion of pied-piping in French can be found in Godard (1992) and Sag & Godard (1994).

#### Doug Arnold & Danièle Godard

phrases). This is the approach taken in Pollard & Sag (1994) (cf. the Clausal REL Prohibition of Pollard & Sag 1994: 220, which requires the REL value of S to be empty, correctly excluding examples like (12c), but allowing the other examples above, including some that should be excluded). These approaches are not equivalent, since the first approach only imposes restrictions on the preposed phrase as whole, while the second constrains the entire inheritance path between the preposed phrase and the *wh*-word that it contains. It is quite possible that both approaches are necessary.<sup>11</sup>

The second point is that it is worthwhile emphasising that restrictions on REL, and REL-inheritance are different from the restrictions on qUE and qUEinheritance (i.e. pied-piping in interrogatives).<sup>12</sup> For example, consider the contrast in (18), which shows that *some pictures of whom* is fine as the initial phrase of a relative clause, as in (18a), but is not possible as the focus of a question, as in (18b):<sup>13</sup>

	- b. \* I wonder [some pictures of whom] they were admiring \_.
	- c. I wonder [who] they were admiring some pictures of \_.

Notice that REL and qUE also differ in other ways: e.g. as Sag (2010: 490–493) emphasises, though there are some "*wh*-expressions" which can be interpreted as either interrogative or relative pronouns, there are others which cannot ones which can be interpreted as interrogative but not as relative pronouns (i.e. which have non-empty qUE values, but empty REL values), and ones which can be


<sup>11</sup>For example, a restriction on the preposed phrase will not be able to distinguish between the following examples (for context, suppose Sam remembers the titles of some books, and also the fact that some books have objectionable titles):

In both cases the preposed phrase is an NP, but in (ii) the relative inheritance path goes through an S — the complement of *fact*, so (ii) would be excluded by something like Clausal REL Prohibition, and allowed otherwise. Here again, we think the facts are unclear: while (ii) is hardly elegant, we are not sure if it is actually ungrammatical.

<sup>12</sup>See for example Horvath (2006: 578–586).

<sup>13</sup>On Ginzburg & Sag's (2000) account, (18b) is excluded by a constraint that requires non-initial elements of ARG-ST to be [WH { }], WH corresponding to what we are here calling qUE (the WH-Constraint, Ginzburg & Sag 2000: 189). In (18b) *some* is the initial element on the ARG-ST of *pictures*, and (*of* ) *whom* is non-initial, hence the ungrammaticality. Clearly, the fact that (18a) is grammatical means there cannot be an exactly parallel restriction on REL.

#### 14 Relative clauses in HPSG

interpreted as relative pronouns but not interrogatives (i.e. with non-empty REL values, but empty qUE values). For example, *how* and (in standard English) *what* are interrogative pronouns, but not relative pronouns, as the following examples show (as Sag 2010: 493 puts it, there is "no morphological or syntactic unity underlying the concept of an English *wh*-expression"):<sup>14</sup>


With this overview of the internal structure of a relative clause in place, we now turn to the relation between the relative clause and the nominal it modifies (its antecedent).

### **2.1.2 The relative clause and its antecedent**

The combination of a relative clause and the nominal it modifies is traditionally regarded as a head-adjunct structure, where the nominal is the head and the relative clause is the adjunct, as in Figure 2.

Figure 2: A relative clause and its antecedent

The content we want for a modified nominal such as *person to whom Kim spoke*, as for an unmodified nominal such as *person*, is a *restricted index*, i.e. in HPSG

<sup>14</sup>See also Müller (1999a: 81–85) on differences between interrogative and relative pronouns in German. Several non-standard English dialects allow the NP *what* as a relative pronoun like *which* (cf. non-standard *%the book what she bought*, vs. standard *the book which she bought*). No dialect allows determiner *what* as a relative pronoun (though it is fine as an interrogative, as can be seen in (20a)). Sag (2010: 491, note 10) suggests that NP *which* is only ever a relative pronoun (an apparent counter-example like *Which did you buy?* involves determiner *which* with an elliptical noun).

terms a *scope-object* — an INDEX and a RESTR (RESTRICTION) set (a set of objects of type *fact*).<sup>15</sup> For *person*, this is as in (21a), abbreviated as in (21b), for *person to whom Kim spoke* it is as in (22a), abbreviated as in (22b).

To get the content of *person to whom Kim spoke* from the content of *person* is a matter of producing a *scope-object* whose index is the index of *person* (and the relative pronoun), and whose restrictions are the union of the restrictions of *person* with a set containing a *fact* corresponding to the *state-of-affairs* that is the content of the relative clause. Unioning the restrictions gives the intersective interpretation.

Conceptually, this is straightforward, but there is a technical difficulty: the structure in Figure 2 is a head-adjunct structure, and in such structures the con-

<sup>15</sup>In Pollard & Sag (1994), *scope-object*s were called *nom-object*s, and restrictions were sets of *parameterized states of affairs* (*psoa*s), rather than *fact*s. The difference reflects the more comprehensive semantics of Ginzburg & Sag (2000), which involves different kinds of *message* (e.g. *proposition*, *outcome*, and *question*, as well as*fact*). For our purposes, this is just a minor change in feature geometry: *fact*s contain Pollard & Sag-style *state-of-affairs* content as the value of the PROP | SOA path, as can be seen in (21a).

#### 14 Relative clauses in HPSG

tent should come from the adjunct daughter, the relative clause. That is, for "external" semantic purposes (purposes of semantic composition) relative clauses should have *scope-object* content, but as we have seen, their "internal" content is a *soa*. So some special apparatus will be required, as will appear in the following discussion.<sup>16</sup>

This should give the reader an idea of the general shape of an approach to relative clauses like (2) using the HPSG apparatus. In the following sections we will make this more precise by outlining the two main approaches that have been taken to the analysis of relative clauses in HPSG: the lexical approach of Pollard & Sag (1994: Chapter 5), which makes use of phonologically empty elements, and the constructional approach of Sag (1997), which does not.

# **2.2 The lexical approach of Pollard & Sag (1994)**

The idea that relative clauses have a lexical head is appealing for some kinds of relative clause in many languages (see below, e.g. Section 3.2, Section 3.3), but it is problematic for relative clauses like (2) — there is no obvious candidate to serve as the head. This is clearly problematic for a lexical, "head-driven" approach such as HPSG. Building on an approach originally proposed by Borsley (1989), the analysis proposed in Pollard & Sag (1994: Chapter 5) overcomes this problem by assuming that relative clauses involve a phonologically empty head, which Pollard & Sag call R ("relativiser"), and which projects an RP (that is, a relative clause).

R is lexically specified to be a nominal modifier (i.e. [MOD *noun*]) which takes two arguments. The first is an XP, the *wh*-phrase, with a REL value which contains the index of the antecedent nominal. The second is sentential, and constrained to have a SLASH value that includes the XP. With some simplifications and some minor modifications to fit the framework we assume here, this is along the lines of (23) (cf. Pollard & Sag 1994: 216). Here XP 3 is intended to mean an XP whose LOCAL value is 3 , and S: 4 means a clause (a saturated projection of type *verb* – i.e. one with empty SUBJ and COMPS specifications) whose CONTENT is 4 . The 2 that appears in the value of RESTR is identical to the RESTR set of the antecedent nominal.

<sup>16</sup>Though the details are HPSG-specific, this is a general problem, regardless of semantic theory. For example, in a setting using standard logical types, relative clauses *qua* clauses (saturated predications) might be assigned type , but in order to act as nominal modifiers this predicative semantics must be converted into "attributive" (noun-modifying) semantics, i.e. logical type h *et, et* i. See e.g. Sag (2010: 521–524) where an HPSG syntax is combined with a conventional predicate-logic-based semantics for relative clauses.

### (23) Lexical item for the empty relativiser:

 SYNSEM|LOC CAT HEAD MOD N: INDEX 1 RESTR 2 ARG-ST <sup>D</sup> XP 3 [REL { <sup>1</sup> }], S: 4 [SLASH { <sup>3</sup> }]<sup>E</sup> CONT *scope-obj* INDEX 1 RESTR 2 ∪ *fact* PROP|SOA 4 

Standard schemas for combining heads with arguments will produce structures like the RP in Figure 3, which (since MOD is a head feature) will inherit the MOD feature from R, and hence combine with a nominal like *person* in a headadjunct phrase to produce the structure in Figure 3. 17

 

# Figure 3: A Pollard & Sag (1994)-style structure involving a finite *wh*-relative clause

This captures the properties described above, and resolves the issues mentioned in the following way: the first argument of R is specified as [REL { 1 }]. Thus, it must contain a relative pronoun. Moreover, (23) specifies that the first argument must correspond to a gap in the second argument. Hence cases like (6) where there is no *wh*-phrase, or where the *wh*-phrase is *in situ*, are excluded.

Since R, not the slashed S, is the head of RP, there is no problem of mismatch between the content of the S and the relative clause: R is lexically specified as having *fact* (i.e. *scope-object*) content incorporating the "internal" content of its

<sup>17</sup>Here again we have used PP <sup>4</sup> to indicate a PP whose LOCAL value is <sup>4</sup> .

#### 14 Relative clauses in HPSG

complement clause (tagged 3 ) in the appropriate way. This *fact* content will be projected to RP by normal principles of semantic composition relating to heads, complements, and subjects, and RP will produce the right content by unioning the restrictions that come from the head nominal with this *fact* content.

This leaves the question of how upward inheritance of the REL and SLASH values can be prevented. The same method is used for both. The idea is that for features like REL and SLASH (nonlocal features) the value on the mother is the union of the values on the daughters, less any indicated as being discharged ("bound off") on the head daughter (the values that are bound off in this way are specified as elements of the value of a TO-BIND attribute). Thus, R can be specified so as to discharge the SLASH value on its S sister (so that R is [SLASH { }]), and we can ensure that the topmost N is [REL { }], so long as its head N daughter is specified as binding-off the REL value on RP. This specification can be imposed by stipulation in the MOD value of R. See Pollard & Sag (1994: Section 5.2.2) for details.

The approach can be extended to deal with other kinds of relative clause by positing alternative forms of empty relativiser (see below and Pollard & Sag 1994: Chapter 5).

The great attraction of the approach is that, apart from R, it requires no special apparatus of any kind. On the other hand, it requires the introduction of a novel part of speech (R), and the need to posit phonologically empty elements for which there is no independent evidence. Reservations about this lead Sag to develop the constructional approach presented in Sag (1997). 18

# **2.3 The constructional approach of Sag (1997)**

The analysis of English relative clauses in Sag (1997) is constructional and completely dispenses with phonologically empty elements.<sup>19</sup> It involves three main constructions: one for combining relative clauses and nominals, and two for rel-

<sup>18</sup>One detail we ignore here concerns the analysis of "subject" relatives: relative clauses where the relative phrase is a grammatical subject inside the relative clause, as in (i):

<sup>(</sup>i) person who spoke to Kim

Pollard & Sag (1994) treat such examples specially (cf. Pollard & Sag 1994: 218–219), using the "Subject Extraction Lexical Rule" (SELR) which in essence permits a VP to replace an S in an ARG-ST in the presence of a gap (Pollard & Sag 1994: 174), so that R combines with a VP rather than an S. But this is not an essential part of the analysis of relative clauses: it is motivated by quite independent theoretical considerations (specifically, the assumption that gaps are associated only with non-initial members of ARG-ST lists — cf. the "Trace-Principle"; Pollard & Sag 1994: 172). Hence we ignore it here.

<sup>19</sup>See Müller (2021d), Chapter 32 of this volume, for broader discussion of the constructional approach to HPSG.

#### Doug Arnold & Danièle Godard

ative clauses themselves. One of these is a sub-type of *head-filler-phrase* which takes care of the relationship between the preposed *wh*-phrase and the associated gap (cf. below, (26)). The other involves a number of sub-constructions specific to relative clauses, which are treated as a subtype of *clause* (alongside e.g. *declaratives* and *imperatives*). These are outlined (with some simplifications and minor adjustments) in Figure 4. 20

Figure 4: Type hierarchy for *clause*, based on Sag (1997)

The *rel-cl* clause type is associated with the constraints in (24), which simply state that relative clauses are subordinate clauses ([MC –]) that modify nouns and have *propositional* content, and that they do not permit subject-aux inversion ([INV –]).<sup>21</sup>

<sup>20</sup>See Kim & Sells (2008: Chapter 11) for an introductory overview of English relative clauses on similar lines to Sag (1997). Sag (2010: 521–524) outlines an approach which is stated using the Sign-Based Construction Grammar style notation (Boas & Sag 2012). Apart from the semantics (which is formulated using the conventional -calculus apparatus), it is generally compatible with the earlier analysis described here. One simplification we make here is that we follow the more recent work (e.g. Sag 2010: 523) and do not distinguish subject and non-subject finite relative clauses: Sag (1997) follows Pollard & Sag (1994: Chapter 5) in treating them differently (cf. footnote 18; and see Sag 1997: 452–454), but it is not clear how important this is in the framework of Sag (1997).

<sup>21</sup>Giving relative clauses *propositional* content puts them on a par with other kinds of clause, and is not very different from Pollard & Sag's assumption that clauses have *state-of-affairs* content (since *proposition*s are simply semantic objects which contain a SOA).

14 Relative clauses in HPSG

$$\begin{array}{ll} \text{(24)} & rel-cl \Rightarrow \\ & \begin{bmatrix} \text{MC} & - \\ \text{HEAD} & \text{INV} & - \\ \text{MOD} & \text{HEAD} & \text{non} \end{bmatrix} \end{array}$$

Relative clauses such as that in (2) are what Sag calls *fin-wh-rel-cl*, a sub-type of *wh-rel-cl*. This is associated with the constraints in (25). In words: *wh*-relatives are a subtype of relative clause (as stated in the type hierarchy in Figure 4), where the non-head daughter is required to have a REL value which contains the INDEX of the antecedent.<sup>22</sup>

(25) *wh-rel-cl* " ⇒ HEAD|MOD N<sup>1</sup> NON-HD-DTRS -REL 1 #

The framework assumed in Sag (1997) allows multiple inheritance of constraints from different dimensions (Abeillé & Borsley 2021: 18, Chapter 1 of this volume). As well as inheriting properties in the clausal dimension, expressions of type *finwh-rel-cl* are also classified in the phrasal dimension as belonging to a sub-type of head-filler phrase (*head-filler-phrase*), thus inheriting constraints as in (26).<sup>23</sup>

(i) The boy*i* and the girl*j* who*i+j* dated each other are Kim's friends.

<sup>22</sup>For simplicity and to avoid distractions, we have presented *wh*-relatives as N modifiers in (25). This is a conventional assumption, because standard methods of semantic composition ensure that the content of the relative clause is included in the restrictions of a quantificational determiner (as in *every person to whom Kim spoke*), but it is not Sag's analysis. Instead he takes *wh*-relatives to be NP modifiers, which allows him to account for facts about the ordering of *wh*-relatives and bare relatives (see Sag 1997: 465–469). Kiss (2005: 293–294) gives a number of arguments in favour of this view, for example, the existence of what Link (1984) called "hydras", like (i), where the relative clause must be interpreted as modifying the coordinate structure consisting of the conjoined NPs.

Sag's analysis requires a different approach to semantic composition to that assumed here, e.g. one using Minimal Recursion Semantics (MRS, Copestake et al. 2005) or Lexical Resource Semantics (LRS, Richter & Sailer 2004) — see, in particular Chaves (2007), which provides, *inter alia* an analysis of coordinate structures and relative clauses using MRS, and Walker (2017), where an approach to the semantics of relative clauses using LRS is worked out in detail. See also Koenig & Richter (2021), Chapter 22 of this volume for an overview of semantic approaches used in HPSG.

<sup>23</sup>The ] symbol here signifies *disjoint union*. This is like normal set union, except that it is undefined for pairs of sets that share common elements (Sag 1997: 445). Its use here is what ensures that the SLASH value of the mother is the SLASH value of the head daughter less the LOCAL value of the non-head daughter.

Doug Arnold & Danièle Godard

(26) *head-filler-phrase* ⇒ SLASH 1 HD-DTR HEAD *verbal* SLASH 2 ] 1 NON-HD-DTRS -LOCAL 2 

In words: they are *verbal* — e.g. clausal — phrases where the SLASH value of the head daughter is the SLASH value of the mother plus the LOCAL value of the non-head daughter (equivalently, the SLASH value of the mother is the SLASH value of the head daughter less the LOCAL value of the non-head daughter). Headfiller phrases are a sub-type of another phrase type (*head-nexus-phrase*) which specifies identity of content between mother and head daughter.

Putting these together with a constraint that requires clauses to have empty REL values will license local trees like that in Figure 5 for a finite relative clause (*fin-wh-rel-cl*) like (2) (simplifying, and ignoring most irrelevant attributes, and attributes whose values are empty sets or lists).<sup>24</sup>

Figure 5: A Sag (1997)-style structure for a finite *wh*-relative clause

The REL specification on the non-head daughter in (25), which corresponds to the PP in Figure 5, ensures the presence of a *wh*-phrase, and the fact that

<sup>24</sup>This assumption about REL values is one of many minor technical differences between Sag (1997) and Pollard & Sag (1994), where the non-empty REL value is inherited upwards to RP, and is discharged there. This means that for Pollard & Sag, but not for Sag (1997), a *wh*-relative clause is a REL-marked clause.

#### 14 Relative clauses in HPSG

this is a head-filler phrase ensures that the *wh*-phrase cannot be *in situ* (cf. (6b), above); the [REL { }] on the daughter S excludes the possibility of additional relative pronouns inside the S (i.e. the possibility of multiple relative pronouns, cf. \*(*the person*) *to whom Kim spoke about whom*). REL inheritance will carry the index of the antecedent down into the PP, guaranteeing the presence of a relative pronoun co-indexed with any nominal that this relative clause is used to modify. Further upward inheritance of this REL value is prevented by a requirement that all clauses (including relative clauses) have empty REL values.<sup>25</sup> The SLASH specification on the head S daughter will ensure that the LOCAL value of the PP is inherited lower down inside the S, so that the subcategorisation requirements of *speak* can be satisfied, and the right content is produced for this S (and passed to the mother S, because this is a head-filler phrase).

The task of combining a nominal and a relative clause (in particular, identifying indices and unioning restrictions) involves a further phrase type *headrelative-phrase*, as in (27).<sup>26</sup>

```
(27) head-relative-phrase ⇒
```


In words, this specifies a nominal construction (i.e. one whose head is a noun), whose CONTENT is the same as that of its head daughter, except that the content

<sup>25</sup>Sag's account of the propagation of REL values is a special case of the apparatus that is now frequently assumed for propagation of all nonlocal features, SLASH, WH (i.e. qUE), and BACK-GROUND (Ginzburg & Sag 2000: Chapter 5). Upward inheritance is handled by a constraint on *word*s that says that (by default) the REL value of a word is the union of the REL values of its arguments. In the absence of a lexical head with arguments (e.g. in *of whom* and *of whose friends* if *of* is treated simply as a marker) the REL value on a phrase is that of its head daughter (the "*Wh*-Inheritance Principle", WHIP); see Sag 1997: 449. Since these are only default principles, they can be overridden, e.g. by the requirement that clauses have empty REL values.

<sup>26</sup>Sag (1997: 475) uses disjoint set union (]) instead of set union (∪) for the computation of RESTR values. While this works for the case at hand, it does not work as a general operation for combining restrictions into sets since it excludes multiple occurrences of the same predicate in a utterance. Therefore and for reasons of consistency with other proposals discussed in this chapter and the whole volume, we assume normal set union here. We follow Copestake et al. (2005: 288) in assuming that RESTR values are multisets.

#### Doug Arnold & Danièle Godard

of the non-head-daughter (the relative clause) has been added to its restriction set. (Thus, it is this construction that takes care of the mismatch between the "internal", propositional, CONTENT of the relative clause itself, and its "external" contribution of restrictions on the nominal it modifies). Since *head-relative-phrase*s are a subtype of *head-adjunct-phrase*, which requires the MOD value of the nonhead to be identical to the SYNSEM value of the head (Sag 1997: 475), this will give rise to structures like that in Figure 6. 27

Figure 6: Sag's (1997) analysis of a relative clause plus its antecedent

From a purely formal point of view, the *head-relative-phrase* construction is not strictly necessary. It would be possible to build its semantic effects into the *rel-cl* construction, so that the structure in Figure 6 would be an entirely normal head-adjunct phrase where the content comes from the adjunct daughter. There are two arguments against this. One is that it would require the relative clause to have nominal (i.e. *scope-object*) content, which is somewhat at odds with its status as a clause. The other is that it would push the semantic mismatch into the relative clause itself. That is, semantically, relative clauses like *to whom Kim spoke* would no longer be normal head-filler phrases where CONTENT is shared between head and mother. Perhaps neither argument is compelling — and in fact, the discussion of relative clauses in Sag (2010: 522) employs essentially this approach, making the *wh*-relative clause construction responsible for converting the propositional semantics of its head daughter into the noun-modifying seman-

<sup>27</sup>This is not the normal semantics associated with head-adjunct phrases (where the content is simply the content of the adjunct daughter). This could be dealt with by introducing a separate sub-type of *head-adjunct-phrase* which deals with content as in (27): *head-adjunct-phrase* itself would impose no constraints on content. Notice that we again follow Ginzburg & Sag (2000: 122, 387) in taking restrictions to be sets of *facts* (Sag 1997 assumes they are sets of *propositions*). Nothing hangs on this.

#### 14 Relative clauses in HPSG

tics appropriate for a relative clause (Sag 2010: 522); this approach was previously proposed by Müller (1999a: 95), see also Müller & Machicao y Priemer (2019: 345).

# **2.4 Interim Conclusions**

The discussion so far has focused on one kind of relative clause, sketched the basic ideas and intuitions behind the HPSG approach, and outlined the two main approaches: that of Pollard & Sag (1994) and that of Sag (1997). At some levels they seem very different (e.g. in the use of phonologically empty lexical heads vs. the use of phrasal constructions), and there are differences in terms of low level technical details (e.g. precisely which phrases are specified as having empty REL values, and in the precise way inheritance of SLASH and REL values is terminated). But in other respects they are very similar: for the most part the same features are used in ways that are not radically different.

More significantly, the approaches involve a common view of the relation between relative clause and antecedent: the view that the relative clause is adjoined to the antecedent, with the relation between the antecedent and the relativised constituent within the relative clause being one of co-indexation (a more or less anaphoric relation): a view that can be traced back to Chomsky (1977).

Outside HPSG this style of analysis stands in contrast to two others: the *raising* analysis (see *inter alia* Schachter (1973); Vergnaud (1974); Kayne (1994: Section 8.2–8.4)), and the *matching* analysis (see *inter alia* Chomsky 1965: 137–138; Lees 1961; Sauerland 1998: Section 2.4). Under the raising analysis, the relative clause contains a DP of the form *which*+noun, which is preposed to the beginning of the clause; then the noun is moved out of the relative clause ("raised") to combine with a determiner, which selects both the noun and the relative clause. According to the matching analysis, the relative clause is adjoined to the antecedent, as in the adjunction analysis, but, as in the raising analysis, the relative clause contains a DP *which*+noun, which is preposed to the beginning of the clause; the noun is not raised, but the noun is deleted under identity with the antecedent nominal.

Neither analysis has any appeal from an HPSG perspective: as normally understood, both are fundamentally derivational in nature, presupposing at least two levels of syntactic structure. Moreover, many of the motivations usually cited are absent given standard HPSG assumptions (e.g. arguments from Binding Theory which can be taken as indicating the presence of a *wh*-phrase inside the relative clause fall out naturally without this assumption given the argument-structurebased account of Binding Theory which is standard in HPSG, see Davis, Koenig

#### Doug Arnold & Danièle Godard

& Wechsler 2021, Chapter 9 of this volume for argument structure and Müller 2021a, Chapter 20 of this volume for binding in HPSG). More important, as discussed in Webelhuth et al. (2019), both face numerous empirical difficulties and miss important generalisations which are unproblematic for the style of analysis described here.<sup>28</sup>

# **3 Varieties of relative clause**

In this section we will look at how the approaches introduced above have been adapted and extended to deal with other kinds of relative clause in a variety of languages.<sup>29</sup> Section 3.1 looks at other kinds of relative clause which involve a relative pronoun, notably ones which do not involve a finite verb. Section 3.2 and Section 3.3 look at relative clauses which do not involve relative pronouns: Section 3.2 looks at relative clauses which can be analysed as involving a complementiser; Section 3.3 looks at "bare" relatives, which involve neither relative pronouns nor complementisers. Section 3.4 looks at non-restrictive relative clauses, which lack the intersective semantics associated with prototypical relative clauses.

One dimension of variation among relative clause constructions which we will discuss only in passing relates to whether, in the case of relative clauses that involve a filler-gap construction, the gap is genuinely absent phonologically (as in the examples we have looked at so far), or whether it is realised as a full pronoun (a so-called *resumptive pronoun*) as in (28) from Alqurashi & Borsley (2012: 28), or the English example in (29) — the resumptive pronouns are indicated in italics.


<sup>28</sup>For example, both analyses treat *wh*-words like *who*, *what*, *which*, and their equivalents as determiners, whereas in fact they behave like pronouns. Case assignment appears to pose a fundamental problem for the raising analysis, since it seems to predict that the case properties of the antecedent NP should be assigned "downstairs" inside the relative clause. But they never are (see Webelhuth et al. 2019: 238–239).

<sup>29</sup>In addition to the phenomena and languages we discuss, the HPSG literature includes more or less detailed treatments of relative clauses in Bulgarian (Avgustinova 1996; 1997), German (Müller 1999a; 1999b: Chapter 10), Hausa (Crysmann 2016), Polish (Mykowiecka et al. 2003; Bolc 2005), and Turkish (Güngördü 1996).

14 Relative clauses in HPSG

The analysis of resumptive pronouns is discussed elsewhere in this volume (Borsley & Crysmann 2021: Section 6, Chapter 13 of this volume), and while they are an important feature of relative clause constructions in many languages (see e.g. Vaillette 2001; Vaillette 2002; Taghvaipour 2005; Abeillé & Godard 2007; Alotaibi & Borsley 2013), the issues seem to be similar in all constructions involving unbounded dependencies, and not specific to relative clauses.

# **3.1** *Wh***-relatives**

Finite *wh*-relatives in English have been discussed above (Section 2). English also allows *wh*-relatives which are headed by non-finite verbs, such as (30); (31) is a similar example from French.

	- 'a peacock in whose feathers to place the mail'

Non-finite relatives were not discussed by Pollard & Sag (1994), but Sag's (1997) constructional approach provides a straightforward account. It involves distinguishing two sub-types of *head-filler-phrase*: a finite subtype which has an empty SUBJ list, and a non-finite subtype whose SUBJ list is required to contain just a PRO (that is, a pronominal that is not syntactically expressed as a syntactic daughter). This requirement reflects the fact that non-finite *wh*-relatives do not allow overt subjects:

(32) \* a person [on whom (for) Sam to place the blame]

The relative clause in (30) receives a structure like that in Figure 7. Apart from the finite specification, this differs from the finite *wh*-relative in Figure 5 above only in the presence of the PRO on the SUBJ list.<sup>30</sup>

The exclusion of overt subjects is not peculiar to non-finite relatives (it is shared by non-finite interrogatives, cf. *I wonder on whom (\*for Sam) to put the blame*), but non-finite *wh*-relatives are subject to the apparently idiosyncratic restriction that the *wh*-phrase must be a PP:

<sup>30</sup>The use of S*inf* in Figure 7 is an approximation. First, S is standardly an abbreviation for something of type *verb* with empty SUBJ and COMPS values, and here there is a non-empty SUBJ. Second, Sag would have CP instead of S here, reflecting his analysis of *to* as a complementiser rather than an auxiliary verb, as is often assumed in HPSG analyses (e.g. Ginzburg & Sag 2000: 51–52; Levine 2012; Sag et al. 2020: 89). S and CP are not very different (both *verb* and *comp* are subtypes of *verbal*), but Sag (1997: 458) is careful to treat *to* as a *comp* and non-finite *wh*relatives as CPs because this gives a principled basis for excluding overt subjects.

Figure 7: Sag's (1997: 462) analysis of a non-finite *wh*-relative clause (*inf-wh-relcl*)


The relevant constraints can be stated directly — roughly as in (34) (disregarding constraints that are inherited from elsewhere). In words, these constraints say that a non-finite head-filler phrase must have an unexpressed subject, and a non-finite *wh*-relative clause is a non-finite head-filler phrase whose non-head daughter is a PP.

$$\begin{array}{l} \text{(34)} \quad \text{a. } \underset{\begin{bmatrix} \text{in} \text{-}head\text{-}filter\text{-}phase \implies \\ \text{HD-DTR} \begin{bmatrix} \text{HEAD } \begin{bmatrix} \text{VFORM } non\text{-}finite \end{bmatrix} \\ \text{SUBJ} \quad \text{\{PRO\}} \end{bmatrix} \end{array}}{\text{b. } \underset{\begin{bmatrix} \text{in} \text{-}head\text{-}filter\text{-}rel\text{-}cl \implies \\ \text{NON -}Head\text{-}filter\text{-}rel \implies \\ \text{[NON -HD-DTRS \text{ } \{PP\} ]} \end{bmatrix}} \end{array}$$

# **3.2 Complementiser relatives**

As well as *wh*-relatives, which involve relative pronouns, there are cases of relative clauses which appear to be headed by what is plausibly analysed as a complementiser. In this section we look first at Arabic, where a complementiser analysis

14 Relative clauses in HPSG

has been proposed and then at English, where such an analysis seems possible for some cases, but where it is controversial. We also discuss an interesting construction in French.<sup>31</sup>

### **3.2.1 Arabic**

Alqurashi & Borsley (2012) argue that in Arabic finite relatives the word *ʔallaði* 'that' (transliterated as *llaði* in (35), from Alqurashi & Borsley 2012: 27) and its inflectional variants should be analysed as a complementiser, with a SYNSEM value roughly as in (36).<sup>32</sup>


<sup>31</sup>There are also cases which involve a relative pronoun *and* a complementiser, as in the following from Hinrichs & Nakazawa's (2002) discussion of Bavarian German:

(i) der the Mantl coat (den) which wo that i I kaffd bought hob have

```
(Bavarian German)
```
'the coat which I bought'

Hinrichs & Nakazawa (2002) analyse these as *wh*-relatives, even when the relative pronoun is omitted, as it can be under certain circumstances. In the course of a discussion of unbounded dependencies in Irish, Assmann et al. (2010) discuss how Irish relative clauses can be analysed in HPSG. Their analyses assumes the simultaneous presence of overt complementisers and phonologically null relative pronouns.

<sup>32</sup>Here S*fin* means a finite clause (a *verb* which is COMPS and SUBJ saturated). NP*def* in the MOD means a fully saturated definite nominal whose CONTENT is given after the colon. According to (36) the content of the S*fin* is merged with the restrictions of this modified NP. This is imprecise: as discussed above, what should be merged is a *fact* constructed from the content of the S*fin*.

#### Doug Arnold & Danièle Godard

According to this, *ʔallaði* 'that' will combine with a slashed finite sentential complement, to produce a phrase which will modify a definite NP. When it combines with that NP, its content will have the same INDEX as the NP, and the restrictions of the NP combined with the propositional content of the sentential complement. The SLASH value on the sentential complement means that it will contain a gap (or a resumptive pronoun) which also bears the same index.

Notice that there is no role for a REL feature here (obviously, since there is no relative pronoun). The presence of the SLASH value indicates that Alqurashi & Borsley assume that Arabic relatives involve an unbounded dependency (i.e. that the gap or resumptive pronoun may be embedded arbitrarily deeply within the relative clause). In *wh*-relatives, as described above, the unbounded dependency is what Pollard & Sag (1994: 155) call a "strong" unbounded dependency, i.e. one that is terminated by at the top by a filler (the *wh*-phrase), in a head-filler phrase. This is not the case here — here there is no filler, and upward inheritance of the gap is halted by the head *ʔallaði* 'that' itself (cf. its own empty SLASH specification). That is, Arabic relatives (and complementiser relatives generally) are normal head-complement structures, involving what Pollard & Sag (1994: 155) call a "weak" unbounded dependency construction (like English purpose clauses and *tough*-constructions).<sup>33</sup>

Since *ʔallaði* 'that' shows inflections agreeing with the antecedent NP for NUM-BER, GENDER, and CASE, different forms will impose additional restrictions on the modified NP (e.g. the form transliterated as *llaði* in (35) will add to (36) the additional requirement that the NP which is modified must be masculine singular).

Notice that Alqurashi & Borsley's account is entirely lexical: no constructional apparatus is used at all. Hahn (2012) argues for a constructional alternative.<sup>34</sup>

### **3.2.2 English**

A similar analysis can be proposed for English *that*-relatives as in (37) (see, for example, Borsley & Crysmann (2021: 557), Chapter 13 of this volume for an appropriate lexical entry for *that* on this approach). However, historically, this approach has not always been favoured: Pollard & Sag (1994) treated some uses of *that* as simply a marker (i.e. the realisation of a MARKING feature whose value

<sup>33</sup>Alqurashi & Borsley (2012: 42) assume that SLASH inheritance is governed by a default principle, so the empty SLASH specification on *ʔallaði* 'that' prevents upward inheritance. The same effect could be achieved in other ways (e.g. with an appropriate TO-BIND specification).

<sup>34</sup>Arabic also has finite relatives that do not have an overt relativiser (and which occur with indefinite antecedents). Alqurashi & Borsley analyse these as involving a phonetically null complementiser. In addition, Arabic also has non-finite and free relatives, which have received some attention. See Melnik (2006), Haddar et al. (2009); Zalila & Haddar (2011), Hahn (2012), and Crysmann & Reintges (2014) for further discussion.

14 Relative clauses in HPSG

is *that*, as opposed to *unmarked*), and others as a relative pronoun, see Pollard & Sag (1994: 221–222). Sag (1997: 462–464) prefered to treat *that* as a relative pronoun.<sup>35</sup>

	- b. person that everyone thinks \_admires Kim

As regards Pollard & Sag's (1994) analysis, it may be recalled that this involves a non-empty REL value on the relative clause (cf. Figure 3). The fact that it is possible to coordinate *that* relatives with normal *wh*-relatives quite freely, as in (38), is a natural consequence if the REL value of the coordinate structure is shared by both conjuncts (implying that both conjuncts contain relative pronouns, of course).

(38) a book [that/which you own or that/which you can borrow]

On Sag's (1997) analysis, relative clauses (in fact clauses in general) are required to have empty REL values (cf. above Section 2.3, especially footnote 24) so similarity of REL values is not an issue. However, there is another issue: Sag (1997) assumes that all and only *wh*-relatives are NP modifiers (rather than N modifiers as we have presented them here, cf. footnote 22). Since coordination involves identity of MOD values, data like (38) lead Sag to conclude that *that*-relatives must be NP modifiers, and consequently must be *wh*-relatives, i.e. must contain a relative pronoun (namely, *that*).

Potential evidence against analysing *that* as a relative pronoun, and in favour of a complementiser-style (or perhaps marker-style) analysis, is that, unlike normal relative pronouns, *that* does not allow pied-piping, cf. (39b).

	- b. \* the person to that I spoke \_

Sag (1997: 464) and Pollard & Sag (1994: 220) argue that this restriction is compatible with a relative pronoun analysis on the assumption that *that* has nominative case, which prevents it occurring as e.g. the complement of a preposition. Sag observes that *who* (which is generally regarded as a relative pronoun) follows the same pattern:

<sup>35</sup>Pollard & Sag (1994: Section 5.2.3) treat instances of *that* in relative clauses involving relativisation of a top level subject, like (37a), as a relative pronoun. In other relative clauses, in particular those involving relativisation of embedded subjects, like (37b), or non-subjects, *that* is treated as a marker, meaning that such clauses are treated as instances of bare relatives. It is hard to find clear empirical evidence against this, but an analysis which provides a uniform treatment of English *that*-relatives is clearly more appealing.

#### Doug Arnold & Danièle Godard

	- b. \* the person to who I spoke \_

However, this line of argument is not very convincing. What (39) shows is that *that* cannot appear as complement of a preposition, but can be associated with a gap that is the complement of a preposition. But this makes it difficult to analyse it as a filler in a head-filler phrase, where SLASH inheritance ensures identity between the LOCAL values of filler and gap (including, of course CASE): if *that* is nominative, then it should not be compatible with non-nominative gaps, such as we see in (39a). But if it is not a filler, then it must be a head (or marker).

Treating *that* as a head, presumably a complementiser, is in some respects straightforward (the lexical entry in Borsley & Crysmann (2021: 557), Chapter 13 of this volume is a starting point), but it also raises questions that go well beyond the scope of this paper. For example, in the context of Sag's 1997 analysis, it is clear that such an approach requires the introduction of a new sub-type of *rel-cl*: one headed by a particular version of *that*. But it does not settle the question of the relationship this new type of relative clause should have to the existing types (i.e. precisely where in the type hierarchy it should sit), or how the requirement of *that* as the head should be imposed.<sup>36</sup>

### **3.2.3 French**

Besides *wh*-relatives, French has relatives introduced by complementisers: *que* 'that' and *dont* 'of which'. *Dont*-relatives present something of a challenge, which is addressed in Abeillé & Godard (2007). They analyse *dont* as a complementiser introducing finite relatives, following Godard (1988) (see e.g. Abeillé & Godard 2007: Section 2.1). It can introduce a relative with a PP*de* gap (i.e. a gap that could be occupied by a PP marked with the preposition *de* 'of'). The contrast between the grammatical (41a) and the ungrammatical (41b) arises because whereas *parler* 'talk' in (41a) takes a PP*de* complement, *comprendre* 'understand' in (41b) takes an NP complement, and so cannot contain a gap licensed by *dont*, as can be seen in (42a) and (42b).


<sup>36</sup>It also ignores the analysis of *who*, which one would presumably not want to treat as a complementiser. An appealing idea is to accept that *who* is nominative as a way of ruling out (40b), and hope that a treatment of other filler-gap mismatches will provide an account of why (40a) is acceptable (see Borsley & Crysmann 2021: Section 9, Chapter 13 of this volume, and references there).

14 Relative clauses in HPSG

	- b. \* On One résoudra will.resolve d' of un a problème. problem Intended: 'We will resolve a problem.'

Abeillé & Godard (2007: 54) suggest a lexical entry for *dont* with a SYNSEM value along the lines of (43) (cf. also Winckel & Abeillé 2020: 112).

(43) Lexical entry for the French complementiser *dont*:

$$\begin{array}{|c|c|c|c|}\hline \text{ } & \left[\begin{array}{c} \text{ $\!\text{H}$ } \\ \text{ $\!\text{H}$ } \text{ } & \left[\begin{array}{c} \text{ $\!\text{H}$ } \\ \text{ $\!\text{H}$ } \text{ } & \left[\begin{array}{c} \text{ $\!\text{N}$ } \\ \text{ $\!\text{N}$ } \text{ } & \left[\begin{array}{c} \text{ $\!\text{N}$ } \\ \text{ $\!\text{N}$ } \text{ } \\ \text{ $\!\text{O}$ } \text{ } \\ \text{ $\!\text{O}$ } \text{ } \\ \\ \end{array} \end{array} \end{array} \left| \begin{array}{c} \text{ $\!\text{N}$ } \text{D} \text{EX} \text{ } \text{[}\!\text{I}\text{]} \\ \text{ $\!\text{O}$ } \\ \text{ $\!\text{O}$ } \\ \\ \text{ $\!\text{O}$ } \\ \\ \text{ $\!\text{O}$ } \text{CNOT} \left[ \begin{array}{c} \text{ $\!\text{I}$ } \\ \text{ $\!\text{N}$ } \text{CNOT} \left[ \begin{array}{c} \text{ $\!\text{N}$ } \\ \text{ $\!\text{C}$ } \text{AT} \\ \text{ $\!\text{S}$ } \\ \\ \end{array} \right] \end{array} \right] \end{array} \right] \end{array} \right]$$

In words: *dont* is a complementiser that takes a finite S complement, and heads a phrase that can act as an N modifier. *Dont* itself has no inherent semantic content (it simply combines the CONTENT of its complement S with that of the nominal that the relative clause will modify). The complement S has a SLASH value that contains a PP*de* which is co-indexed with the antecedent nominal, as specified in the MOD value. This SLASH element is non-pronominal (*nprl*) — that is, a genuine gap, rather than a resumptive pronoun, and is not inherited upwards (only 4 , the remaining set of SLASH values, is inherited upwards).<sup>37</sup>

Given this, one might expect that it is generally impossible for a *dont*-relative to have an NP as the relativised constituent, but this is not the case. It is in

<sup>37</sup>Abeillé & Godard (2007: Section 3.4) assume that gaps and resumptive pronouns are associated with distinct subtypes of *local* value: *prl* (pronominal) for pronouns and *nprl* (non-pronominal) for genuine gaps. The relevance of this will appear directly.

#### Doug Arnold & Danièle Godard

fact possible, provided that the relativised constituent is realised by an overt pronoun (i.e. a resumptive pronoun) and is somewhere inside the complement of (some) propositional attitude and communication predicates. For example, in (44) the pronoun *le* represents the relativised constituent, which appears in the complement of *être certain* 'be sure'.<sup>38</sup>

(44) un a problème problem dont of.which [Paul Paul est is certain sure [qu' that on one *le* it résoudra]] will.solve (French) 'a problem that Paul is sure that we will solve'

Unsurprisingly, the presence of a resumptive pronoun is associated with immunity to island constraints. So, for example, in (45) we have a relative where the relativised constituent is within a relative clause inside an embedded NP, which is impossible for a genuine gap.

(45) un a problème problem dont of.which [Paul Paul est is certain sure [qu' that il y a there is [quelqu'un someone qui that *le* it résoudra]]] will.solve (French) 'a problem such that Paul is sure that there is someone who will solve it'

What is surprising, however, is that the path between *dont* and the predicate of which the resumptive is a complement *is* sensitive to island constraints. To see this, compare the grammatical (44) and (45) with the ungrammatical (46). All involve a *dont* relative containing a resumptive pronoun licensed by *être certain*, but in (46), *être certain* is separated from *dont* by an island boundary (*être certain* is inside a relative clause).


<sup>38</sup>One might consider an alternative analysis where *dont* is associated with a PP*de* gap dependent of *certain*, and the resumptive pronoun is a normal anaphoric pronoun — this would correspond to a main clause along the lines of *Paul is sure, of this problem, that we will resolve it*. One problem with this alternative is that this sort of PP*de* dependent is not very good with *certain*, see (i). Another is that it would not explain the fact that the personal pronoun is obligatory — (ii), with no personal pronoun, is ungrammatical, though semantically coherent:

14 Relative clauses in HPSG

(46) \* un a problème problem dont of.which il y a there is [quelqu'un someone qui who est is certain sure qu' that on one *le* it résoudra] will.solve (French)

In short, though the dependency between the licensing predicate and the resumptive pronoun can cross island boundaries, the dependency between the licensing predicate and *dont* cannot. Abeillé & Godard's (2007) account of this is that while the dependency between the licensing predicate and the relativised constituent involves inheritance of the LOCAL value of a resumptive element, the one between the licensing predicate and *dont* involves inheritance of a gap. They suggest that this should be dealt with by a lexical rule along the lines of (47), where ⊕ signifies the "append" relation – in combination with the ellipsis it allows the possibility that the COMPS list may contain additional elements.

(47) Lexical rule for propositional attitude predicates in French:

$$\begin{aligned} \left[ \begin{array}{c} \text{COMPS} \\ \left\{ \begin{array}{c} \text{CMS} \\ \text{SLASH} \\ \end{array} \left\{ \begin{array}{c} \text{CON} \\ \text{D} \\ \end{array} \right\} \cup \left[ \begin{array}{c} \text{prl} \\ \text{CNN} \\ \end{array} \right] \right\} \cup \left[ \begin{array}{c} \text{D} \\ \end{array} \right] \right) & \leftrightarrow \\ \left[ \begin{array}{c} \text{SLASH} \\ \text{SLASH} \\ \end{array} \left\{ \begin{array}{c} \text{nprl} \\ \text{CAT} \\ \text{P} \text{P}\_{\text{de} \boxplus \text{}} \end{array} \right] \right\} \cup \left[ \begin{array}{c} \text{nprl} \\ \text{CAT} \\ \text{D} \text{D}\_{\text{de} \boxplus \text{}} \end{array} \right] \end{aligned} $$

In words, the left-hand side of this describes a lexeme that takes a CP complement with a SLASH value containing a pronominal (*prl*) element (that is, a CP containing a resumptive pronoun). The effect of the rule is to provide a lexical entry that binds off the resumptive pronoun by not passing it up in its own SLASH value. Instead the newly licensed lexical entry introduces a PP*de* gap co-indexed to the resumptive pronoun, that is, the sort of gap that can legitimately be associated with *dont*. The information about the COMPS list is taken over to the output lexical entry by convention, since it is not mentioned in the output. Thinking from the top down, this rule produces a predicate that can appear in a context with an inherited requirement for a PP*de* gap (e.g. a relative clause headed by *dont*), and convert this into a requirement for a resumptive pronoun further down. Thinking from the bottom up, the predicate can bind off a resumptive pronoun, and

#### Doug Arnold & Danièle Godard

replace it with a gap dependency.<sup>39</sup> The SLASH value <sup>3</sup> in the input registers the possibility that the CP complement may contain other gaps, as in (48)), where *déclare* 'states' is the verb which has undergone the LR, the pronominal is *il*, and *combien* 'how much' is extracted from its CP complement.

(48) un a homme politique politician dont of.which on we vérifie check combien how.much la the société company déclare states qu that 'il he a has été been payé paid 'a politician whose stated remuneration package is being checked'

In addition, the other (possible) complements of the predicate (abbreviated by … in (47)) may also contain a gap. Given the SLASH Amalgamation Principle (see Borsley & Crysmann 2021: 549, Chapter 13 of this volume) all the SLASH values in the complements are amalgamated by the predicate resulting in the SLASH value { 1 } ∪ 4 . The information about SLASH elements coming from other arguments than the CP is carried over to the output of the lexical rule. Usually 4 will be the empty set.

# **3.3 Bare relatives**

Not all languages realise relative clauses using relative pronouns or complementisers. In this section we will discuss HPSG analyses of what we will call *bare relatives* in Japanese and Korean (Section 3.3.1) and in English, where they are often called "*that*-less" relatives (Section 3.3.2). The absence of relative pronouns

Lit: 'a problem of which Paul is sure that we have spoken and that he is sure that we will come back to it later'

<sup>39</sup>As Abeillé & Godard (2007) point out, the facts are not quite as simple as this. In particular there is an interesting complication involving coordination. It is possible for a *dont*-clause containing a predicate like *être certain* to involve a coordinate structure, where one conjunct contains a PP*de* gap and the other contains a pronoun, as in (i) (the second conjunct here contains the pronominal *y* 'to it'; the English translation is intended to make it clear that the second conjunct is in the scope of *être certain*).

<sup>(</sup>i) un a problème problem dont of.which Paul Paul est is certain sure [[que that nous we avons have parlé spoken \_] [et and que that nous we *y* to.it reviendrons will.come.back plus more tard]] late (French)

Dealing with this involves a formal complication that we leave aside here. SeeAbeillé & Godard (2007: Section 3.4).

14 Relative clauses in HPSG

means there is no question of pied-piping, hence no role for a REL feature in these constructions.

### **3.3.1 Bare relatives in Japanese and Korean**

Japanese relative clauses corresponding to (2) contain a gap, but are otherwise similar to normal clauses, cf. (49) (from Sirai & Gunji 1998: 18); in Korean they are distinguished by special marking on the topmost verb — cf. the *-nun* affix on *sayngkakha* 'think' in (50) (from Kim 2016b: 285).


(50) [motwu-ka everyone-NOM [Kim-i Kim-NOM \_*i* ilk-ess-ta-ko] read-PST-DECL-COMP sayngkakha-nun] think-PRS.MOD chayk*<sup>i</sup>* book (Korean)

'the book (that) everyone thinks Kim read'

Evidence for a gap in these examples is that it is not possible to put an overt NP in place of the gap (e.g. putting *sore-wo* 'it-ACC' in (49), or *sosel-u* 'novel-ACC' in (50) renders them ungrammatical).<sup>40</sup>

Sirai & Gunji (1998) provide a non-constructional account of Japanese bare relatives like (49). They show how an account that uses SLASH inheritance could work, but their actual proposal is SLASH-less. They assume that the tense affixes are heads of verbal predicates, and operate via "predicate composition" — by inheriting the subcategorisation requirements of the associated verb. The adnominal tense affixes are special in that a) they are specified as nominal modifiers, and b) they inherit the subcategorisation requirements of the associated verb, less an NP that is co-indexed with the modified nominal. (A lexical equivalent of this could be implemented with a lexical rule which removes an element from a verb's ARG-ST and introduces a MOD value containing a nominal with the corresponding index – as suggested by Müller (2002: Section 3.2.7) for prenominal adjectives in German.). Of course, a SLASH-less account like this will only deal with cases of local relativisation — where the relativised NP is an argument of

<sup>40</sup>As well as these "standard" relatives, Korean and Japanese both have other kinds of relative construction, notably what are sometimes called *internally headed* relatives, and so-called *pseudorelatives*, which are briefly discussed below. See Section 4.2.2.

#### Doug Arnold & Danièle Godard

the highest verb. Sirai & Gunji argue that cases of nonlocal relativisation, like (51), should be treated as involving null-pronominals (which are a common feature of Japanese). They suggest that the requirement that the modified noun and the pronoun be co-indexed should be captured via a pragmatic condition that requires the relative clause be "about" the modified noun.

(51) [Ken-ga Ken-NOM [Eiko-ga Eiko-NOM \_*<sup>i</sup>* yon-da] read.PST to COMP sinzitei-ru] believe-PRS hon*<sup>i</sup>* book (Japanese) 'the book that Ken believes Eiko read'

Kim (2016b) provides a constructional analysis for Korean which resembles Sag's (1997) analysis of English — see also Kim (1998a) and Kim & Yang (2003). He suggests that Korean allows verb lexemes to be realised as "modifier verbs" (*v-mod*) subject to a constraint along the lines of (52) — these are verbs that can head a subordinate clause ([MC –]) which modifies a nominal (N).<sup>41</sup>

$$\begin{array}{cc} \text{(52)} \quad \left[ \begin{array}{c} \text{vec} \\ \text{HEAD} \\ \text{MC} \\ \text{MOD} \end{array} \right] \end{array} \left[ \begin{array}{c} \text{verb} \\ \text{MC} \\ \text{MOD} \end{array} \right] \quad \left[ \begin{array}{c} \text{(verb} \\ \text{'I'} \\ \text{'I'} \\ \text{'I'} \end{array} \right] \quad \left[ \begin{array}{c} \text{(verb} \\ \text{'I'} \\ \text{'I'} \\ \text{'I'} \end{array} \right] \quad \left[ \begin{array}{c} \text{(verb} \\ \text{'I'} \\ \text{'I'} \\ \text{'I'} \end{array} \right] \quad \left[ \begin{array}{c} \text{(verb} \\ \text{'I'} \\ \text{'I'} \\ \text{'I'} \end{array} \right] \quad \left[ \begin{array}{c} \text{(verb} \\ \text{'I'} \\ \text{'I'} \\ \text{'I'} \end{array} \right] \quad \left[ \begin{array}{c} \text{(verb} \\ \text{'I'} \\ \text{'I'} \\ \text{'I'} \end{array} \right] \quad \left[ \begin{array}{c} \text{(verb} \\ \text{'I'} \\ \text{'I'} \\ \text{'I'} \end{array} \right] \quad \left[ \begin{array}{c} \text{(verb} \\ \text{'I'} \\ \text{'I'} \\ \text{'I'} \end{array} \right] \quad \left[ \begin{array}{c} \text{(verb} \\ \text{'I'} \\ \text{'I'} \\ \text{'I'} \end{array} \right] \quad \left[ \begin{array}{c} \text{(verb} \\ \text{'I'} \\ \text{'I'} \\ \text{'I'} \end{array} \right]$$

He also proposes a construction (the *head-relative-mod* construction, see Kim 2016b: 290) to combine a structure headed by such a modifier verb with a head nominal, along the lines of (53).<sup>42</sup>

(53) *hd-relative-mod-phrase* ⇒ SLASH {} HD-DTR <sup>1</sup> N<sup>2</sup> NON-HD-DTRS S HEAD|MOD 1 SLASH NP <sup>2</sup> 

In words: a phrase can consist of a head noun and a clause headed by a modifier verb containing an NP gap which is co-indexed with the head noun. The empty SLASH value on the mother is necessary to prevent the gap being inherited upwards. The SLASH value on the S daughter ensures the presence of an appropriate

<sup>41</sup>Different sub-types of *v-mod* are associated with different tense affixes. (52) differs from Kim's formulation, e.g. Kim's formulation involves a POS (part-of-speech) feature and he assumes that MOD is list valued (see Kim 2016b: 285). This is not important here.

<sup>42</sup>Again, our formulation is slightly different from Kim's for the sake of consistency with the rest of our presentation.

#### 14 Relative clauses in HPSG

gap, and the MOD value on the S daughter ensures that it is headed by a verb with the right morphology. It will license structures like that in Figure 8. Kim does not discuss the semantics, but it would be straightforward to add constraints to this construction along the lines of those presented above.

Figure 8: A Korean relative clause, based on Kim (2016b: 295)

### **3.3.2 Bare relatives in English**

English also has bare relative clauses, both finite, as in (54a), and non-finite as in (54b):

(54) a. the cakes Kim bought \_

b. some cakes (for Sam) to eat \_

In English, there is no obvious motivation for suggesting a special sub-type of "relative clause heading" verb, so an alternative way of licensing noun-modifying

#### Doug Arnold & Danièle Godard

clauses with appropriate SLASH values is required. In Pollard & Sag (1994) this was the role of an empty relativiser similar to that described above, differing only in taking a single argument — a slashed clause (see Pollard & Sag 1994: 222; recall that the relativiser discussed above takes two arguments: a *wh*-phrase, and a slashed clause). This gives structures like that in Figure 9. 43

Figure 9: A Pollard & Sag (1994)-style structure for an English bare relative

In Sag (1997) the task of licensing such bare relatives is carried out by a unary branching construction (an immediate subtype of *rel-cl*) as in (55). In words: a relative clause can be a noun-modifying clause whose head daughter contains an NP gap that is co-indexed with the modified nominal.

```
(55) non-wh-rel-cl ⇒

       HEAD h
                MOD N1
                        i
       SLASH {}
       HD-DTR -

                SLASH 
                       NP 1
```
<sup>43</sup>According to Pollard & Sag (1994: 222), the clausal argument of this single argument version of R can either be bare, as here, or marked by *that*. Thus, terminological accuracy demands the observation that for Pollard & Sag some instances of *that*-relatives are actually "bare" in the sense of containing neither a relative pronoun nor a complementiser (though others, in particular those involving relativisation of a top level subject, are analysed as containing a version of *that* which is actually a relative pronoun). See above footnote 35.

#### 14 Relative clauses in HPSG

This licenses structures like that in Figure 10. 44

This differs from Kim's proposal for Korean in how the SLASH value is bound off: in particular, where Kim's analysis involves a nominal and a slashed S, Sag's involves a nominal and an *un*slashed S — the clause is [SLASH { }], it is the VP

<sup>44</sup>Sag also proposes a subtype of (55) to deal with non-finite bare relatives, like (i), which he calls *simple infinitival relatives*, cf. *simp-inf-rel-cl* in Figure 4. See Sag (1997: 469). Abeillé et al. (1998) includes discussion of a similar construction in French — "infinitival *à*-relatives", like (ii):



Neither discussion addresses the special modal semantics associated with non-finites, e.g. (i) means something like "books that Sam can (or should) read".

See Müller (2002: Sections 3.2.4, 3.2.7) for a lexical rule-based analysis of parallel German modal infinitives like (56).

(iii) ein a zu to lesendes read Buch book 'a book to be read'

Müller's discussion also omits discussion of the semantics, but it seems clear that the semantics must involve embedding the propositional content of the relative under a modal operator, and elsewhere Müller (2006: 871–872; 2007: 112–113; 2010: Section 4.2) has argued that this cannot be handled by inheritance mechanisms, as suggested by Sag and Abeille et al, so that a lexical rule approach is required. On the limitations of inheritance for semantic embedding, see also Sag, Boas & Kay (2012: 10–12).

#### Doug Arnold & Danièle Godard

which is [SLASH {NP}]. This reflects the fact that in English the gap in the relative clause cannot be the subject, accounting for the contrast in (56).<sup>45</sup>

	- b. person who spoke to Sam

The issue of where upwards termination of SLASH inheritance should occur highlights the impossibility of having an entirely lexical and non-constructional account of bare relatives that does not employ empty elements. At first glance, a purely lexical approach might seem simple: since all we need is to create clauses specified as [MOD N] which contain a co-indexed gap, all we seem to need is verbs specified as in (57).

(57) HEAD " *verb* MOD N<sup>1</sup> # SLASH NP <sup>1</sup> 

In the absence of special constructions or empty elements, this would license structures like that in Figure 10, except that the upward inheritance of the SLASH value will not be terminated, allowing an additional spurious filler for the gap, as in (58):<sup>46</sup>

(58) \* That book*<sup>i</sup>* , I enjoyed [the book*<sup>i</sup>* Kim read \_*<sup>i</sup>* ]

There is one class of exceptions to this — that is, phrases which might be analysed as relative clauses for which a purely lexical account *is* possible. Examples involving participial phrases and a variety of other post-nominal modifiers, notably APs and PPs, are often called *reduced relatives*, and analysed as a type of relative clause. Sag (1997: 471) follows this tradition (*red-rel-cl* in Figure 4). What this comes down to is the assumption that such examples involve clauses containing predicative phrases with PRO subjects, co-indexed with the nominals they modify.

<sup>45</sup>Examples like (56a) are acceptable in some non-standard dialects of English. Sag suggests this is not problematic, since they could be analysed as reduced relatives (see Sag 1997: 471), but see immediately below where we cast doubt on this. If we are right, then the non-standard dialects would have something like (53) instead of (55).

<sup>46</sup>The SLASH based analysis of Japanese relatives outlined in Sirai & Gunji (1998) manages to avoid this problem, without either special constructions or empty elements, but it is not fully lexical, because it assumes tense affixes combine with the associated lexical verb in the syntax (hence the affix is able to block higher inheritance of the gap introduced by the lexical verb).

14 Relative clauses in HPSG


It is not obvious to us what is gained by treating these as relative clauses introduced by a special construction. A lexical account seems at least as appealing, where the relevant properties of the phrases (e.g. noun modifying semantics) are projected directly from lexical entries for the head words. The reason such a nonconstructional approach is possible is that such examples involve neither relative pronouns nor genuine gaps, so there are neither REL nor SLASH dependencies to terminate.<sup>47</sup> This approach seems particularly appealing in the cases like (59e), which would be analysed as just involving an attributive adjective (*fond*) which happens to take a complement, along the lines of (60), where { … } stands for the restrictions the adjective itself imposes. But we think a similar account of verbal participles and prepositions is equally plausible.<sup>48</sup>


Notice that in (60) we omit mention of the SUBJ. If we assume the noun-modifying entry is derived from a predicative entry, there are two obvious alternatives: a) that the predicative subject is suppressed; or b) that it is constrained to be unexpressed (i.e. PRO). In the latter case, the two approaches are very similar, the only difference being whether examples like those in (59) are classified as clausal. It is not clear whether this has empirical consequences.

# **3.4 Non-restrictive (supplemental) relatives**

The examples of relative clauses considered so far have been *restrictive relatives* (RRCs); they are interpreted as restricting the denotation of their antecedent to

<sup>47</sup>This argument does not necessarily carry over to languages which allow relativisation of nonsubjects in reduced relatives, such as Arabic. See Melnik (2006: 241).

<sup>48</sup>For example, Müller (2002: 159–164) deals with adjectival passive participles in this way.

#### Doug Arnold & Danièle Godard

a subset of what it would be without the relative clause. So-called *supplemental*, *supplementary*, *appositive*, or *non-restrictive* relatives (NRCs) are different. They do not affect the interpretation of any associated nominal, and are generally interpreted with wide scope, much like independent utterances. For example, if *who understand logic* is read as an NRC as in (61a) it will be interpreted outside the scope of *Kim thinks*.

	- b. Kim thinks linguists who understand logic are clever. (RRC)

NRCs are often set off intonationally, and are subject to a number of surface morphosyntactic restrictions in English. In particular, they must be finite and contain a *wh*-pronoun, witness the ungrammaticality of (62a) and (62b).<sup>49</sup>

	- b. \* Kim, (that) Sandy spoke to, will arrive later.

The analysis of non-restrictive relatives has attracted some attention in the HPSG literature.<sup>50</sup>

Where RRCs are typically nominal modifiers, NRCs are compatible with a wide range of antecedents. Holler (2003) provides an analysis of German nonrestrictive relatives which are adjoined to S, as in (63). Her account uses a version of the empty relativiser from Pollard & Sag (1994) whose MOD value specifies a clausal (rather than nominal) target for modification, and looks for an appropriate antecedent for its first argument (the *wh*-phrase) among the discourse referents contributed by the modification target (for example, the discourse referent corresponding to the proposition expressed by the main clause in (63)). The relative pronoun is thus treated rather like a normal pronoun.

(63) Anna Anna gewann won die the Schachpartie, game.of.chess was which Peter Peter ärgerte. annoyed (German) 'Anna won the game of chess, which annoyed Peter.'

(i) Trois three personnes, people(FEM) [parmi among lesquelles which.FEM Jean], John sont AUX venues. come (French) 'Three people, among which John, have come.'

These have non-restrictive semantics, and some similarities with relative clauses, but Bîlbîie & Laurens point out significant differences, and argue for an analysis that treats them rather differently, as a distinct construction.

<sup>49</sup>More extensive discussion of differences between NRCs and RRCs can be found in Arnold (2007).

<sup>50</sup>Bîlbîie & Laurens (2009) discuss what they call *verbless relative adjuncts*, such as (i), in French and Romanian:

#### 14 Relative clauses in HPSG

Arnold (2004) provides an analysis of English non-restrictive relatives of all kinds. This analysis also takes the relative pronouns involved in NRCs to be much like normal pronouns, but accounts for the syntactic restrictions by making minor modifications to constructions given in Sag's (1997) analysis of restrictive relatives. It assumes a uniform syntax for restrictive relatives and NRCs, but provides a way for relative clauses to combine with the heads they modify in two semantically distinct ways, either restrictively (in the normal way) or nonrestrictively (making their semantic contribution at the same level as the root clause, accounting for the wide-scope interpretation). The fact that supplementary relatives are required to be finite and contain a *wh*-pronoun can then be simply stated (e.g. non-restrictive semantics entails a non-head daughter which is a *fin-wh-rel-cl*).<sup>51</sup> Likewise, the wider range of antecedents available to NRCs can be captured by relaxing the [MOD *noun*] constraint associated with *rel-cl* (so in principle all kinds of relative clause are compatible with any antecedent), and adding it as a requirement associated with restrictive semantics.

The approach to NRCs developed in Arnold (2004) is *syntactically integrated* — NRCs are treated as normal parts of the syntactic structure on a par with restrictive relatives. On the face of it, examples like (64b) are problematic for such an approach:

	- b. You should say nothing, which is regrettable.

When uttered in the context provided by (64a), the interpretation of (64b) is that it is regrettable that *Jo thinks* you should say nothing. This has been taken as an indication that the interpretation of NRCs requires antecedents that are not syntactically realised and only available at a level of conceptual structure (see Blakemore 2006). However, Arnold & Borsley (2008) show that this is incorrect, and in fact a syntactically integrated account combined with the approach to ellipsis and fragmentary utterances of Ginzburg & Sag (2000) makes precisely the right predictions in this case and in a range of others.

Arnold & Borsley (2010) look at NRCs where the antecedent is a VP, and where the gap is the complement of an auxiliary, as in (65).

(65) Kim has ridden a camel, which Sam never would \_.

This is unexpected, because such examples seem to involve an NP filler (*which*) being associated with a gap in a position where an NP is generally impossible, cf.

<sup>51</sup>As stated, given Sag's (1997) assumption that *that*-relatives are a variety of *wh*-relative, this wrongly predicts that supplemental *that*-relatives should normally be allowed. One way around this is to adopt a different analysis of *that*, but Arnold (2004) also considers an analysis whereby *that* has a different kind of REL value from "real" relative pronouns.

#### Doug Arnold & Danièle Godard

*\*Sam never would that activity*. Arnold & Borsley consider a number of analyses, including an analysis which treats *which* as a potential VP, and an analysis which introduces a special relative clause construction. However, they argue that the best analysis is one which relates examples like (65) to cases of VP ellipsis (as in *Kim has ridden a camel but Sam never would*), which involve the VP argument of an auxiliary verb being omitted from its COMPS list. The idea is that auxiliary verbs allow such an elided VP argument to have (optionally) a SLASH value that contains an appropriately co-indexed NP. If such a SLASH value is present, normal SLASH amalgamation and inheritance will yield (65) as a normal relative clause, without further stipulation. See also Nykiel & Kim (2021), Chapter 19 of this volume, Kim (2021), Chapter 18 of this volume, Borsley & Crysmann (2021), Chapter 13 of this volume and Sag et al. (2020) for further discussion.

NRCs normally follow their antecedents. However, as Lee-Goldman (2012) observes, there are some special cases where the NRC precedes the antecedent. Such cases involve the relative pronouns *which* and *what* with antecedents that have clausal interpretations, i.e. either actual clauses, as in (66a) and (66c), or other expressions interpreted elliptically as with *later* in (66b).

	- b. It may happen now. *What is worse*, it may happen later.
	- c. It may happen now, or *which would be worse* later.

Lee-Goldman provides a constructional account. It makes use of a feature RELZR, introduced by Sag (2010), which is shared between a relative clause and its filler daughter, and whose value reflects the identity of the relative pronoun (so possible values include *which*, *what*, etc.). Cases like (66b) are dealt with simply by means of a special construction which combines a *what*-relative clause with its antecedent in the desired order. The account of cases like (66a) and (66b) makes use of the idea of constituent order domains for linearisation originally proposed by Reape (e.g. Reape 1994, and Müller 2021b: Section 6, Chapter 10 of this volume). The relevant construction combines a phrase whose RELZR value is *which* (e.g. *which would be worse*) with a clause whose constituent order DOMAIN has a coordinator as its first element (e.g. the DOMAIN associated with *or it may happen later*) and produces a phrase where the DOMAIN value of the *which* phrase appears after the coordinator and before the remainder of the clause, giving the desired result.<sup>52</sup>

<sup>52</sup>Lee-Goldman handles the wide scope interpretation of NRCs by implementing a multidimensional notion of CONTENT inspired by Potts (2005). He also extends the analysis described here to deal with cases of *as*-parentheticals (e.g. *As most of you are aware, we have been under severe stress lately*), arguing that *as* should be analysed as a relativiser, and that such clauses should be analysed as relative clauses.

14 Relative clauses in HPSG

# **4 Other functions, other issues**

For reasons of space, we have so far restricted the notion *relative clause* to the typical case: clauses which are nominal modifiers, adjoined to nominals. This ignores a number of relevant phenomena, notably the fact that relative clauses are not necessarily nominal modifiers, and the possibility that even when they function as nominal modifiers they need not be adjoined to nominals. In this section we will provide some discussion of these issues. Section 4.1 will briefly review HPSG analyses of cases where relative clauses are not adjoined to nominals. Section 4.2 will overview HPSG approaches to cases where clauses resembling relative clauses are not nominal modifiers.<sup>53</sup>

# **4.1 Extraposition**

As noted above, relative clauses are typically nominal modifiers, and typically adjoined to the nominals they modify. However, this is not invariably the case: under certain circumstances relative clauses can be *extraposed*, as in (67), where the relative clauses (emphasised) have been extraposed from the subject NP to the end of the clause.

	- b. Something happened then *(that) I can't really talk about here*.
	- c. Something may arise *for us to talk about*.

Several different approaches to extraposition have been proposed in the HPSG literature.

One approach uses the idea of constituent order domains, mentioned briefly in Section 3.4 above (and see Müller 2021b: Section 6, Chapter 10 of this volume). The idea is that an extraposed relative clause is composed with its antecedent nominal in the normal way as regards syntax and semantics, but that rather than being *compacted* into a single DOMAIN element, the nominal and the relative clause remain as separate DOMAIN elements, with the effect that that relative clause can be *liberated* away from the nominal, so that its phonology is contributed discontinuously from the phonology of the nominal, as in the examples in (67). See e.g. Nerbonne (1994: Section 2.9) and Kathol & Pollard (1995) for

<sup>53</sup>Among the other phenomena we have neglected, one should mention *amount* relatives (e.g. Grosu & Landman 2017), that is, relative clauses where what is modified semantically is not a nominal, but an *amount* related to the nominal, as for example in (i) where the relative clause gives information about the *amount* of wine, rather than the wine itself.

<sup>(</sup>i) It would take me a year to drink the wine [that Kim drinks on a normal night].

#### Doug Arnold & Danièle Godard

details. Kathol & Pollard's approach is discussed in more detail in Müller (2021b: Section 6.3), Chapter 10 of this volume.

A second approach treats extraposition as involving a nonlocal dependency, introducing a nonlocal feature, typically called something like EXTRA, which functions much like other nonlocal features (e.g. SLASH). The idea is that a relative clause can make its semantic contribution as a nominal modifier "downstairs", but rather than being realised as a syntactic DAUGHTER (sister to the nominal), the relevant properties (e.g. the LOCAL features) are added to the EXTRA set of the head, and inherited up the tree until they are discharged from the EXTRA set by the appearance of an appropriate daughter constituent, which contributes its phonology in the normal way, but makes no semantic contribution. Thinking from the top downwards, this is equivalent to having a construction which allows a relative clause to appear e.g. as sister to a VP (as in (67a)) without affecting the VP's syntax or semantics, so long as it is pushed onto the EXTRA set of the VP, from where it will be inherited downwards until a nominal occurs which it can be interpreted as modifying (the apparatus needed to deal with the "bottom" of the dependency might be a family of lexical items derived by lexical rule, or a non-branching construction). See e.g. Keller (1995), Bouma (1996), Müller (1999a), Müller (2004), Crysmann (2005), and Crysmann (2013). Extraposition is also discussed in Borsley & Crysmann (2021: Section 8), Chapter 13 of this volume.

A third approach is suggested in Kiss (2005), and adopted in Crysmann (2004) and Walker (2017). This approach exploits the more flexible approach to semantic composition provided by Minimal Recursion Semantics (MRS, Copestake et al. 2005), in the case of Kiss (2005), and Lexical Resource Semantics (LRS, Richter & Sailer 2004) in Walker (2017). See also Koenig & Richter (2021), Chapter 22 of this volume for a discussion of both of these semantic representation languages. The idea is that an extraposed relative clause appears as a normal syntactic daughter in its surface position, but the notion of semantic modification is generalised so that rather than the index of a modifying phrase being identified with that of a sister constituent (as standardly assumed), it may be identified with that of any suitable constituent *within* the sister. That is, adjuncts can be interpreted as modifying not just their sisters, but anything *contained in* their sisters — words and phrase to which they have no direct syntactic connection. This is implemented by means of a set valued ANCHORS feature, which is inherited upwards in the manner of a nonlocal feature, and which allows access to the indices of constituents from lower down. The flexibility of semantic composition afforded by MRS and LRS means that the right interpretations can be obtained. See also

14 Relative clauses in HPSG

Borsley & Crysmann (2021: Section 8.3), Chapter 13 of this volume for a more detailed discussion of Kiss's (2005) approach.

A number of authors have argued for the superiority of an approach using EX-TRA-style apparatus (e.g. Crysmann 2013, Borsley & Crysmann 2021: Section 8, Chapter 13 of this volume), but in terms of theoretical costs and benefits there seems to be little to choose between these alternatives<sup>54</sup> — the first and third approaches rely on particular approaches to constituent order and semantic composition, while EXTRA-style analyses involve only the more commonplace apparatus of nonlocal features (though with the added cost of special constructions or lexical operations to introduce and remove elements from EXTRA sets). Empirically, there are several issues that all approaches deal with more or less successfully (for example, the Right Roof Constraint from Ross 1967: Section 5.1.2 that prevents extraposition beyond the clause, cf. (68b)). However, a more significant factor may be how well different accounts integrate with analyses of extraposition involving other kinds of adjunct and complement (e.g. complement clauses, as in (69)), capturing similarities and differences (see e.g. Crysmann 2013).


# **4.2 Other functions**

In this section we will briefly discuss phenomena involving clauses whose internal structures resemble relative clauses but which do not function as nominal modifiers.<sup>55</sup>

<sup>54</sup>Müller 2004 looks at computer processability of linearisation-based grammars vs. grammars with continuous constituents, that is, grammars using the EXTRA mechanism. He prefers the linearisation approach for computational reasons. However, the linearisation approach was given up later for theoretical reasons having to do with the analysis of German clause structure (Müller 2005; 2021c).

<sup>55</sup>One omission here is discussion of *relative-correlative* constructions, which can be found in Hindi and Marathi, *inter alia*, and which were given an analysis in Pollard & Sag (1994: 227– 232). These involve the paratactic combination of a clause that contains one or more relative pronouns, and what looks like a main clause containing coreferential pronouns, something like 'which boy*i* saw which girl*j* , he*i* proposed to her*j* ' (meaning *the boy who saw the girl proposed to her*). Pollard & Sag's analysis involves associating a set of indices in the REL value of the first clause, which are realised by relative pronouns in the normal way, and an identical set of indices as encoded as the value of a CORRELATIVE feature in the main clause, which are realised by normal pronouns.

#### Doug Arnold & Danièle Godard

### **4.2.1 Complement clauses**

Perhaps the most obvious cases of this kind involve clauses with the internal structure of a relative clause which occur as complements, rather than adjuncts. The following are some examples.<sup>56</sup>



In (70a) we have what looks like a *that* relative which is plausibly analysed as the complement of the superlative (notice that omitting the superlative makes (70a) ungrammatical).

(French)

The German example in (70b) exemplifies the *diejenigen* class of determiners, which require a complement that looks like a relative clause (and is analysed as such in Walker 2017).

In (70c) we have a so-called *it*-cleft, a construction which features a clause resembling a relative clause, but rather than adding information about an associated nominal (as it would if it were a normal relative clause), the clause is interpreted as providing a presupposition ("someone/something solved the problem"), for an associated focus phrase (here the nominal *Kim*, so the interpretation is roughly "… and that person/thing was Kim"). Notice that the focus phrase need not be nominal (e.g. in (70d) it is a PP *from Kim*), again this is unlike normal (restrictive) relatives clauses (which are nominal modifiers).<sup>57</sup> In HPSG, following Pollard & Sag (1994: 260–262), *it*-clefts have typically been analysed as involving a lexical entry for *be* that takes an *it* subject, and two complements: an XP and an S which is marked as containing an XP gap. This makes *it*-clefts look rather different from relative clauses (the only real similarity being the existence of an unbounded dependency). One problem is that it is not clear how this approach

<sup>56</sup>Another case where a relative clause should be analysed as a complement is discussed in Arnold & Lucas (2016).

<sup>57</sup>Notice also that *that*-relatives are usually incompatible with proper name antecedents, but proper names are perfectly acceptable as the focus of an *it*-cleft with a *that*-clause, as in (70c) (Huddleston & Pullum 2002: 1416–1417).

#### 14 Relative clauses in HPSG

can be extended to examples like (71), where we seem to have an NP focus (*Sam*) which is not directly associated with an XP gap — we have instead a PP gap that seems to be associated with a normal relative phrase filler (*on whom*), i.e. where the similarity of the clefted clause to a relative clause is quite strong. It is not obvious how this problem should be dealt with.

# (71) It was Sam [on whom she particularly focused her attention \_].

The French example in (70e) contains a so-called *predicative relative clause* (PRC).<sup>58</sup> Such clauses have the superficial form of a finite relative clause, but differ from them syntactically, semantically, and pragmatically. Koenig & Lambrecht (1999) analyse them as a form of *secondary predicate* (cf. *running away* in English *We saw them running away*). Syntactically, they are restricted to postverbal positions, and are only permitted with certain kinds of verb (notably verbs of perception, like *voir* 'see', and discovery, like *trouver* 'find'), and the relative pronoun must be a top level subject. Semantically, they are subject to constraints on tense, modality, and negation (there must be temporal overlap between the perception/discovery event and the event reported in the relative clause, and the relative clause content cannot be either modal or negative). Pragmatically, their content must be asserted (rather than presupposed). Koenig & Lambrecht provide an analysis which treats PRCs as REL marked clauses with both an internal and an external subject (instances of *head-subject-phrase* which have a non-empty SUBJ value), and which can consequently function as secondary predicates.

### **4.2.2 Dependent noun and pseudo-relative constructions**

The following exemplifies a Korean structure that contains what looks superficially like a relative clause:

(72) Kim-un Kim-TOP [[sakwa-ka apple-NOM cayngpan-wi-ey tray-TOP-LOC iss-nun] exist-MOD kes]-ul KES-ACC mek-ess-ta. eat-PST-DECL (Korean) 'Kim ate an apple which was on the tray.'

Here what is traditionally called a *dependent noun* (*kes*) is preceded by a clause whose verb bears the morphological marking that is characteristic of relative clauses (the -*nun* affix).<sup>59</sup>

<sup>58</sup>The French term is *proposition relative dépendante attribut* (Sandfeld 1965: 139).

<sup>59</sup>Japanese has a similar construction, involving the nominalising particle *no*, which has received some attention in the HPSG literature (e.g. Kikuta 1998; 2001; 2002). A difference is that there is no special morphology on the clause in Japanese, as noted above, in Section 3.3.1.

#### Doug Arnold & Danièle Godard

However, unlike a normal relative clause, this "dependent" clause does not contain a gap, instead it contains what might be regarded as the semantic head of the construction (in this case, *sakwa-ka* 'apple'), notice that the clause+*kes* constituent satisfies the selection restriction of the verb *mek-ess-ta* 'ate'; this is what motivates the translation and explains why such clauses are often regarded as "internally headed" relatives. Kim (2016b: 303–317) notes a number of differences between *kes*-clauses and normal relatives (e.g. *kes*-clauses do not allow the full range of relative affixes to appear), and suggests these clauses are better analysed as complements of *kes*. See also Kim (1996), Chan & Kim (2003), Kim (2016a), and references there.<sup>60</sup>

Another Korean structure that has some similarity with relative clauses is the so-called *pseudo-relative* construction, exemplified in (73).<sup>61</sup>

	- 'the smell that characterises the burning of rubber'

There is again no gap in the relative clause; again, only certain kinds of relative affix are allowed on the verb (here only *-nun*); and only a limited range of nouns allow this kind of relative clause; this makes them rather like complement clauses. However, it is less plausible to think of a noun like *naymsay* 'smell' taking a complement (unlike *kes*), and these clauses are like prototypical relative

(i) [ya yesterday indɛ person mi 1SG wɛ see.PN.∅ gɔ] DEF yimaa die.PSP boli. go.PN.3SG (Dogon) 'The person I saw yesterday is dead.'

<sup>60</sup>Pollard & Sag (1994: 232–236) discuss a number of cases of what appear to be more plausible instances of *internally headed* relatives from a number of languages (Lakhota, Dogon, and Quechua); the following is from Dogon:

Here we have a determiner *gɔ* preceded by a clause containing what would be the external head of a standard relative clause (in this case *indɛ* 'person'). The key difference between this and the Korean case is the absence here of any obvious clause-external nominal like *kes* which can be treated as the head which takes the relative clause as a complement. Pollard & Sag (1994: 234) suggest (following Culy 1990) that NPs like that in (i) involve an exocentric construction, but no empty elements (neither an empty nominal, nor an empty relativiser). The NP consists of a determiner and a nominal, where the nominal consists of just a clause whose REL value contains the index of the nominal. This REL value is inherited downwards into the clause where it is identified with the index of one of the NPs, here the index of *indɛ* 'person': the effect of this is that the index of *indɛ* 'person' becomes the index of the whole NP. (This summary ignores a number of technical and empirical issues that have to do with the inheritance and binding-off of REL values.)

<sup>61</sup>A similar construction can be found in Japanese, (cf. Kikuta 1998; 2001; 2002; Chan & Kim 2003).

14 Relative clauses in HPSG

clauses in not allowing topic marking. Kim suggests this is a special construction where the relation of head noun and relative clause is that the noun describes the perceptive result of the situation described by the clause (e.g. the smell is the perceptive result of the rubber burning). See Kim (1998b), Yoon (1993), Chan & Kim (2003), Cha (2005), and Kim (2016b).

### **4.2.3 Free relatives**

Perhaps the most significant case of a clause type that resembles a relative clause but which does not function as a nominal modifier consists of the so-called *free* (*headless*, or *fused*) *relatives*, exemplified in (74). These have received considerable attention in the HPSG literature.

	- b. She ate *whatever I suggested*.
	- c. She put it *where I suggested*.

As these examples suggest, free relatives can be interpreted as involving either definite descriptions, as in (74a) *the thing that I suggested*, or universal quantification, as in (74b) *everything that I suggested*. They can also have adverbial or prepositional interpretations, as in (74c) *in the place that I suggested*. The interpretation is related to the choice of *wh*-phrase. There are some special restrictions. For example, in English free relatives must be finite, as can be seen from (75a), and there are restrictions on what *wh*-words are allowed (e.g. *what* is permitted, as in (74a), but *which* is not, witness (75b)).

	- b. \* She ate *which I suggested*.

Free relatives resemble prototypical *wh*-relatives (and interrogative clauses) in containing a gap, and an initial *wh*-phrase which is interpreted as filling the gap. They differ from interrogatives in having the external distribution of NPs or other phrases (e.g. PPs, AdvPs, etc) rather than clauses (for example in (74a) *what I suggested* is the complement of *eat*, and in (74c) *where I suggested* is a complement of *put*, neither of which allow clausal complements). They differ from prototypical relative clauses in not being associated with a nominal antecedent. They can contain relative pronouns which are not permitted in normal *wh*-relatives, notably the *-ever* pronouns, *whatever*, *whoever*, etc., and *what*, witness the ungrammaticality of the following:<sup>62</sup>

<sup>62</sup>*What* is not a relative pronoun in standard English, but it is in some other varieties, and (76b) is grammatical in those.

#### Doug Arnold & Danièle Godard

(76) a. \* She ate the thing(s) *whatever I suggested*. b. \* She ate the things(s) *what I suggested*.

In general the possibilities of relative inheritance (pied-piping) in free relatives are dramatically reduced compared to prototypical relatives and interrogatives. For example in English, relative inheritance is not possible from the complement of a preposition, as can be seen from (77b):

(77) a. Try to describe *what you talked about*. b. \* Try to describe *about what you talked*.

In fact, in English relative inheritance only seems to be possible from *wh*-phrases in pre-nominal position (determiners and genitive NPs), as in (78), and (80a) below.<sup>63</sup>

(78) They will steal *what(ever) things they can carry*.

As with prototypical relatives, the initial *wh*-phrase in a free relative has to satisfy restrictions imposed "downstairs" in the relative clause (i.e. restrictions that follow from the location of the gap). In addition, however, it seems that with free relatives the *wh*-phrase is also sensitive to restrictions imposed from the outside the relative clause — the *wh*-phrase of a free relative has to be of the appropriate category for the position where the free relative appears. For example, as a first approximation, a free relative with *what* is only possible where an NP is possible, and a free relative with *where* is only possible where a locative PP is possible. This is the so-called *matching effect* in free relatives.<sup>64</sup>

One interesting instance of this involves case marking. Consider, for example, the German data in (79). These show a free relative in a position which requires nominative case marking, containing a relative pronoun whose role within the relative clause requires nominative marking. Since *wer* 'who' is nominative, all is well. By contrast, in (79b) while the nominative *wer* satisfies the requirements within the relative clause, there is a case conflict because the free relative as a whole is the complement of a verb *vertrauen* 'trust' that requires a dative complement. The result is ungrammatical. Examples like (79c) show a complication.

<sup>63</sup>Other languages are less restrictive, e.g. Müller (1999a: 57) gives German examples analogous to (77b). See footnote 66.

<sup>64</sup>In fact, things are more complicated. For example, in *He walked to* [*where his horse was waiting*]*.* we have a free relative with *where* in an NP position (object of a preposition) rather than a PP position. See e.g. Kim (2017: 382–383) for discussion.

#### 14 Relative clauses in HPSG

Here again there is a case conflict: within the relative clause, the relative pronoun is required to be accusative (complement of *empfehlen* 'recommend'), and the free relative as a whole is in a nominative position. However, the result is grammatical, presumably because the morphological form of the neuter relative pronoun *was* 'what' can realise either nominative or accusative case (unlike the masculine *wer*).

	- b. \* Wer who.NOM klug clever ist, is vertraue trust ich I immer. ever Intended: 'I trust whoever is clever.'
	- c. Was what.NOM/ACC du you mir me empfiehlst, recommend macht makes einen a guten good Eindruck. impression 'What you recommend me makes a good impression.'

The agreement properties of free relatives are somewhat surprising, and reveal a potential complication in the matching effect. Notice that in (80a) the *wh*-phrase, *whoever's dogs*, is plural, and triggers plural agreement on the verb in the relative clause.

	- b. Whoever is/\*are running around (is in trouble).

This is not surprising since *whoever's dogs* is headed by a plural noun (*dogs*). However, the free relative as a whole triggers singular agreement, consistent with the agreement properties coming from the relative pronoun — *whoever* is singular, as can be seen from (80b). This is also consistent with the semantics: the free relative in (80a) denotes the person whose dogs are running around, not the dogs (in this it resembles an NP like *anyone whose dogs are running around*, which involves a normal relative clause construction).<sup>65</sup> This shows a complication of the matching effect: it seems that within-clause requirements are reflected on the initial *wh*-phrase (*whoever's dogs* is the subject of the relative), but the external distribution reflects the properties of the relative *word* (*whoever*). Of course, the fact that relative inheritance is so limited in free relatives means that usually the

<sup>65</sup>This is not a universal property: Borsley (2008) notes that examples in Welsh resembling (80a) are interpreted as meaning that the dogs are in big trouble, not the owner.

#### Doug Arnold & Danièle Godard

*wh*-phrase consists of just the *wh*-word, so that it is very difficult to tease these things apart.<sup>66</sup>

Following Müller (1999a) on German, free relatives have received considerable attention in the HPSG literature, with analyses dealing with a variety of languages, including: Arabic (Alqurashi 2012; Hahn 2012), Danish (Bjerre 2012; 2014), English (Kim & Park 1996; Kim 2001; Wright & Kathol 2003; Francis 2007; Yoo 2008; Kim 2017), German (Hinrichs & Nakazawa 2002; Kubota 2003), Persian (Taghvaipour 2005), and Welsh (Borsley 2008).

The central analytic problem is this: leaving aside the complication arising from case syncretism and relative inheritance just mentioned, the existence of matching effects has suggested to some (e.g. Kubota 2003) that the *wh*-phrase should be the head of the free relative, because the distribution of free relatives depends on the properties of the *wh*-phrase. So, for example, the NP *what* would be the head of *what I suggested*. But this is inconsistent with *what* being the filler of the gap in *what I suggested* (i.e. the missing object of *suggested*), because in a normal filler-gap construction the filler is *not* the head. If, instead, we assume that *what* is primarily the filler of the gap in the free relative, then we should assume that the clause *I suggested* \_ is the head of the free relative — and the distributional properties of the free relative are unexplained.

### **4.2.4 Pseudo-clefts and transparent free relatives**

Two constructions that show some similarity with free relatives, and have received some attention in the HPSG literature, are *specificational pseudo-clefts*,

(i) Ihr you könnt can beginnen, start [mit with wem whom ihr you (beginnen) start wollt]. want (German) 'You can start with whoever you like.'

<sup>66</sup>Müller (1999a: 90) discusses the following German example of a free relative with an initial PP containing the nominal relative word *wem* 'whom' (i.e. showing relative inheritance to PP):

He observes that the free relative functions as a PP, just like *mit wem*, and in the variant where the parenthesised instance of *beginnen* is present, the within-clause role is also that of a PP. Note also that German and other languages have mismatching free relative clauses, that is, the requirements of the downstairs verbs differ from the ones of the upstairs verb. For example, a free relative clause with a PP object as relative phrase can function as an accusative object in the matrix clause (Bausewein 1991: 154; Müller 1999a: 61). Müller (1999a: 96) accounts for this by assuming a schema that explicitly does not project the category of the relative phrase but a related category. No account assuming any of the material in the relative clause to be the head can account for the data. This does exclude certain HPSG analyses and also Minimalist approaches to free relative clauses like the ones suggested by Donati (2006), Ott (2011) and Chomsky (2008; 2013). See Müller (2020: Section 4.6.2) for further discussion and Borsley & Müller (2021: Section 3.4), Chapter 28 of this volume for a brief summary.

14 Relative clauses in HPSG

exemplified in (81), and so-called *transparent free relatives* (TFRs), exemplified in (82).

	- b. [What Kim will be wearing] is a new coat.
	- c. [What she did] was cut her hair.
	- d. [What she did not bring] was any wine.
	- b. Her reply was [what anyone would consider \_belligerent].

Specificational pseudo-clefts typically consist of a *wh*-clause, *be*, and a *focal phrase* (e.g. *any wine* in (81d)). The focal phrase corresponds to a gap in the *wh*-clause (e.g. in (81d) *any wine* is interpreted as the missing object of *bring*). They raise a number of issues that are not typical of relative clauses, notably the existence of *connectivity effects* whereby the focal phrase behaves as though it was part of the *wh*-clause (e.g. in (81d) the negative polarity item *any* is licensed by the negation in the *wh*-clause). Beyond this, it is not obvious whether the *wh*-clauses should be analysed as related to interrogatives, as in Yoo (2003), or as related to free relatives, as in Gerbl (2007: especially Chapter 3 and Chapter 4).<sup>67</sup>

In TFRs the relative appears to function somewhat like a parenthetical modifier of a *nucleus* (e.g. *a belligerent tone* in (82a)), which seems to provide the head properties of the phrase as a whole — so for example the TFR in (82a) has the characteristics of an NP, that in (82b) has those of an AP (it is a natural starting point to assume the nucleus is internal to the relative clause, since otherwise one has the puzzle of a relative clause which is both incomplete and occurs before the head it modifies). TFRs are in some ways even more restricted than other kinds of relative (only *what* is allowed as the relative expression), but in others less restricted (e.g. free relatives have the external distribution of NPs, but the TFR in (82b) has the distribution of an AP, like its nucleus *belligerent*). Some approaches to TFRs employ novel kinds of structure (e.g. *grafts*, cf. van Riemsdijk 2006), but Yoo (2008) and Kim (2011) provide HPSG analyses which capture the relevant properties using the existing apparatus with only minor adjustments.

<sup>67</sup>It can be difficult to distinguish this kind of pseudo-cleft from cases involving a normal free relative. An example like *What she is wearing is a mess* is superficially similar to (81b), but it involves a free relative. Notice, for example, it can be paraphrased with a normal NP plus relative clause (as "The thing that she is wearing is a mess") and *what* can be replaced with *whatever*. It does not have a paraphrase with an *it*-cleft or a simple proposition — it cannot be paraphrased as "It is a mess that she is wearing" or "She is wearing a mess".

#### Doug Arnold & Danièle Godard

# **5 Conclusion**

The analysis of relative clauses has been important in the theoretical evolution of HPSG, notably in the development of a constructional approach involving inheritance from cross-classifying dimensions of description. Empirically, relative clauses have been the focus of a significant amount of descriptive work in a variety of typologically diverse languages. Our goal in this paper has been exposition and survey rather than argumentation towards particular conclusions, but, perhaps paradoxically given what we have just said, we think one conclusion that clearly emerges is that, from an HPSG perspective at least, *relative clauses are not a natural kind*. There is *nothing* one can say that will be true of everything that has been described as a "relative clause" in the literature. As regards internal structure, some are *head-filler* structures (*wh*-relatives), while others are *headcomplement* structures (complementiser relatives, some kinds of bare relative); correspondingly, some involve relative pronouns (hence a REL feature), some do not. It is true that most involve some kind of SLASH dependency, but this is hardly unique to relative clauses, and even this does not hold of the dependent noun and pseudo-relatives mentioned in Section 4.2.2. There is no semantic unity — while restrictive relatives are noun-modifiers, non-restrictive relatives function more like independent clauses, and free relatives have nominal or adverbial semantics. Similarly, as regards external distribution: prototypical relatives are noun modifiers, and appear in *head-adjunct-phrase* structures, but expressions with similar internal structure occur as complements (e.g. free relatives, clefts, and complements of superlative adjectives).

We do not think it is a bad thing that this conclusion should emerge from a discussion of HPSG approaches. Rather, it suggests to us that an approach that tries to impose unity will end up being procrustean. In fact, discussion of relative clauses seems to us to show some of the best features of HPSG — the analyses we have summarised are generally well formalised, carefully constructed (detailed, precise, and coherent), and both empirically satisfying and insightful, with relatively few *ad hoc* assumptions or special stipulations. The discussion shows how the expressivity and flexibility of the descriptive machinery of the framework are compatible with a wide range of phenomena across a range of languages.

14 Relative clauses in HPSG

# **Abbreviations**


# **Acknowledgements**

We are grateful to the editors of this volume and two anonymous referees for their careful and insightful comments on earlier versions of this contribution. Remaining flaws are our sole responsibility, of course.

# **References**


#### Doug Arnold & Danièle Godard


14 Relative clauses in HPSG

*International Conference on Head-Driven Phrase Structure Grammar, Université Paris Diderot*, 27–46. Stanford, CA: CSLI Publications. http://cslipublications. stanford.edu/HPSG/2010/ahhkm.pdf (20 June, 2020).


#### Doug Arnold & Danièle Godard


14 Relative clauses in HPSG


#### Doug Arnold & Danièle Godard


#### 14 Relative clauses in HPSG

in Cognitive Science 12), 71–119. Edinburgh: Centre for Cognitive Science, University of Edinburgh. https://www.upf.edu/documents/2983731/3019795/1996 ccs-evcg-hpsg-wp12.pdf (10 February, 2021).


#### Doug Arnold & Danièle Godard

*putational Linguistics. Proceedings of the conference*, 174–180. Cambridge, MA: Association for Computational Linguistics. DOI: 10.3115/981658.981682.


14 Relative clauses in HPSG


#### Doug Arnold & Danièle Godard

*mal syntax and semantics*, vol. 2, 191–214. The Hague: Thesus Holland Academic Graphics.


14 Relative clauses in HPSG


#### Doug Arnold & Danièle Godard

ogy and Syntax), 1497–1553. Berlin: Language Science Press. DOI: 10 . 5281 / zenodo.5599882.


#### 14 Relative clauses in HPSG


#### Doug Arnold & Danièle Godard

*Grammar, Department of Informatics, University of Lisbon*, 364–374. Stanford, CA: CSLI Publications. http: / / csli - publications .stanford . edu / HPSG / 2005/ (10 February, 2021).


14 Relative clauses in HPSG

sler (eds.), *The proceedings of the 9th International Conference on Head-Driven Phrase Structure Grammar, Kyung Hee University*, 373–389. Stanford, CA: CSLI Publications. http://csli-publications.stanford.edu/HPSG/3/wk.pdf (10 February, 2021).


# **Chapter 15**

# **Island phenomena and related matters**

# Rui P. Chaves

University at Buffalo, SUNY

Extraction constraints on long-distance dependencies – so-called *islands* – have been the subject of intense linguistic and psycholinguistic research for the last half century. Despite of their importance in syntactic theory, the heterogeneity of island constraints has posed many difficult challenges to linguistic theory, across all frameworks. The HPSG perspective of island phenomena is that they are unlikely to be due to a unitary syntactic constraint given the fact that virtually all such island constraints have known exceptions. Rather, it is more plausible that island constraints result from a combination of independently motivated syntactic, semantic, pragmatic and processing phenomena. The present chapter is somewhat different from others in this volume in that its focus is not on HPSG analyses of some phenomena, but rather on the nature of the phenomena itself. This is because there is evidence that most of the phenomena are not purely grammatical, and to that extent independent from HPSG or indeed any theory of grammar. One may call this view of island phenomena "minimalist" in the sense that much of it does not involve formal grammar.

# **1 Introduction**

This chapter provides an overview of various island effects that have received attention from members of the HPSG community. I begin with the extraction constraints peculiar to coordinate structures, because they not only have a special status in the history of HPSG, but also because they illustrate well the nonunitary nature of island constraints. I then argue that, at a deeper level, some of these constraints are in fact present in many other island types, though not necessarily all. For example, I take it as relatively clear that *factive islands* are purely pragmatic in nature (Oshima 2007), as are *negative islands* (Kroch 1998;

Rui P. Chaves. 2021. Island phenomena and related matters. In Stefan Müller, Anne Abeillé, Robert D. Borsley & Jean- Pierre Koenig (eds.), *Head-Driven Phrase Structure Grammar: The handbook*, 665–723. Berlin: Language Science Press. DOI: 10.5281/zenodo.5599846

#### Rui P. Chaves

Szabolcsi & Zwarts 1993; Abrusán 2011; Fox & Hackl 2006; Abrusán & Spector 2011), although one can quibble about the particular technical details of how such accounts are best articulated. Similarly, the *NP Constraint* in the sense of Horn (1972) is likely to be semantic-pragmatic in nature (Kuno 1987; Godard 1988; Davies & Dubinsky 2009; Strunk & Snider 2013; Chaves & King 2020). Conversely, I take it as relatively uncontroversial that the *Clause Non-Final Incomplete Constituent Constraint* is due to processing difficulty (Hukari & Levine 1991; Fodor 1992). See also Kothari (2008), Ambridge & Goldberg (2008), and Richter & Chaves (2020) for evidence that "bridge" effects in filler-gap dependencies at least in part due to pragmatics.

In the present chapter I focus on islands that have garnered more attention from members of the HPSG community, and that have caused more controversy cross-theoretically. My goal is to provide an overview of the range of explanations that have been proposed to account for the complex array of facts surrounding islands, and to show that no single unified account is likely. For a more comprehensive overview of islands and related phenomena see Chaves & Putnam (2020: Chapter 3).

# **2 Background**

As already detailed in Borsley & Crysmann (2021), Chapter 13 of this volume, HPSG encodes filler-gap dependencies in terms of a set-valued feature SLASH. Because the theory consists of a feature-based declarative system of constraints, virtually all that goes on in the grammar involves constraints stating which value a given feature takes. By allowing SLASH sets to be identified (or unioned), it follows that constructions in which multiple gaps are linked to the same filler are trivially obtained, as in (1).

	- b. Which celebrity did you expect [[the pictures of ] to bother the most]?
	- c. Which celebrity did you [inform [that the police was coming to arrest ]]?
	- d. Which celebrity did you [compare [the memoir of ] [with a movie about ]?
	- e. Which celebrity did you [hire [without auditioning first]]?
	- f. Which celebrity did you [[meet at a party] and [date for a few months]]?

#### 15 Island phenomena and related matters

But another advantage of encoding the presence of filler-gap dependencies as a feature is that certain lexical items and constructions can easily impose idiosyncratic constraints on SLASH values. For example, to account for languages that do not allow preposition stranding, it suffices to state that prepositions are necessarily specified as [SLASH { }]. Thus, their complements cannot appear in SLASH instead of COMPS. The converse also occurs. Certain uses of the verb *assure*, for example, are lexically required to have one complement in SLASH rather than in COMPS. Thus, extraction is obligatory as (2) shows, based on Kayne (1984: 4).

	- b. Who can you assure me \_ to be the most competent?

As we shall see, it would be rather trivial to impose the classic island constraints in the standard syntactic environments in which they arise.<sup>1</sup> The problem is that island effects are riddled with exceptions which defy purely syntactic accounts of the phenomena. Hence, HPSG has generally refrained from assuming that islands are syntactic, in contrast to Mainstream Generative Grammar.

# **3 The Coordinate Structure Constraint**

Ross (1967) first observed that coordinate structures impose various constraints on long-distance dependencies, shown in (3), collectively dubbed the *Coordinate Structure Constraint*. For perspicuity, I follow Grosu (1973a) in referring to (i) as the *Conjunct Constraint* and to (ii) as the *Element Constraint*.

(3) Coordinate Structure Constraint (CSC):

In a coordinate structure, (i) no conjunct may be moved, (ii) nor may any element contained in a conjunct be moved out of that conjunct … unless each conjunct properly contains a gap paired with the same filler.

The *Conjunct Constraint* (CC) is illustrated by the unacceptability of the extractions in (4). No such constraint is active in other constructions like those in (5) and (6), for example.

	- b. \* Which celebrity did you see [ and Priscilla]? (cf. with 'Did you see Elvis and Priscilla?')

<sup>1</sup> In such a view, island effects could perhaps result from grammaticized constraints, induced by parsing and performance considerations (Pritchett 1991; Fodor 1978; 1983).

#### Rui P. Chaves

	- b. Which celebrity did you see with Elvis? (cf. with 'Did you see Priscilla with Elvis?')
	- b. Which celebrity did you say Robin arrived earlier than ?

In HPSG accounts of extraction that assume the existence of traces (Pollard & Sag 1994; Levine & Hukari 2006) the CC must be stipulated at the level of the coordination construction, by stating that conjuncts cannot be empty elements.<sup>2</sup> On the other hand, the CC follows immediately in a traceless account of fillergap dependencies (Sag & Fodor 1995; Bouma et al. 2001; Ginzburg & Sag 2000; Sag 2010) since there is simply nothing to conjoin in (4), and thus nothing else needs to be said about conjunct extraction; see Sag (2000) for more criticism of traces.

HPSG's traceless account of the CC is semantic in nature, in a sense. Coordinators like *and*, *or*, *but* and so on are not regarded as heads that select arguments, and therefore have empty ARG-ST and valence specifications. And given that HPSG assumes that the signs that can appear in a given lexical head SLASH values are valents, then it follows that the signs that coordinators combine with cannot instead be registered in the coordinator's SLASH feature. Hence, words like *and* have no valents, no arguments and therefore no conjunct extraction. Incidentally, adnominal adjectives cannot be extracted either, for exactly the same reason, as they are not selected by any head, and therefore are not listed in any ARG-ST list.

In order to allow certain adverbials to be extractable, Ginzburg & Sag (2000) assume that those adverbials are members of ARG-ST. See Levine & Hukari (2006) for more on adverbial extraction, and see Borsley & Crysmann (2021), Chapter 13 of this volume for further discussion.<sup>3</sup>

<sup>2</sup>See however Levine (2017: 317–318) for the claim that each conjunct must contain at least one stressed syllable. Given that traces are phonologically silent, nothing is there to bear stress and the CC is obtained. This raises the question of why no such stress constraint exists in P-stranding, for example, or indeed in any kind of extraction.

<sup>3</sup>The empirical facts are less clear when it comes to adnominal PPs, however. Even PPs that are usually regarded as modifiers can sometimes be extracted, as in *From which shelf am I not supposed to read any books?* In many such extractions the PP can alternatively be parsed as VP modifier, which complicates judgements. See also De Kuthy (2002: 176) and De Kuthy (2021: 1066), Chapter 23 of this volume on NP-PP split.

#### 15 Island phenomena and related matters

Let us now turn to the *Element Constraint*, illustrated in (7). As before, the constraint appears to be restricted to coordination structures, as no oddness arises in the comitative counterparts like (8), or in comparatives like (9).<sup>4</sup>

	- b. \* Which celebrity did you see [a picture of and Priscilla]? (cf. with 'Did you see a picture of Elvis and Priscilla?')
	- b. Which celebrity did you see [Priscilla with the brother of ]?
	- b. Which celebrity did you say that [[the sooner we take a picture of ], [the quicker we can go home]]?

The Across-The-Board (ATB) exception to the CSC is illustrated by the acceptability of (10), where each conjunct hosts a gap, linked to the same filler. As already noted above in (1), the fact that multiple gaps can be linked to the same filler is not unique to coordination.

(10) a. Which celebrity did you buy [[a picture of and a book about ]]? b. Which celebrity did you [[meet at a party] and [date for a few months]]?

Gazdar (1981) and Gazdar et al. (1985) assumed that the coordination rule requires SLASH values to be structure-shared across conjuncts and the mother node, thus predicting both the Element Constraint and the ATB exceptions. The failure of movement-based grammar to predict multiple gap extraction facts was

	- b. We had to do this ourselves. By the end of the year, some student [[had proof-read every document] and [corrected each theorem]]. (∀ doc-theorem *>* ∃ student / ∃ student *>* ∀ doc-theorem)
	- c. Your task is to document the social interaction between [[each female] and [an adult male]].
		- (∀ female *>* ∃ adult male / ∃ adult male *>* ∀ female)

<sup>4</sup>Although Winter (2001: 83) and others claim that coordination imposes semantic scope islands, Chaves (2007: §3.6) shows that this is not the case, as illustrated in examples like those below.

#### Rui P. Chaves

also seen as a major empirical advantage of GPSG/HPSG. A similar constraint is assumed in Pollard & Sag (1994: 202) and Beavers & Sag (2004: 60), among others, illustrated in (11). See Abeillé & Chaves (2021), Chapter 16 of this volume for more discussion about coordination.

(11) Coordination Schema (abbreviated): *coordinate-phrase* ⇒ SYNSEM|NONLOCAL|SLASH 1 DTRS -SYNSEM|NONLOCAL|SLASH 1 , - SYNSEM|NONLOCAL|SLASH 1 

Because the SLASH value 1 is structure-shared between the mother and the daughters in (11), all three nodes must bear the same SLASH value. This predicts the CSC and the ATB exceptions straightforwardly. The failure of Mainstream Generative Grammar to predict these and related multiple gap extraction facts in a precise way is regarded as one of the major empirical advantages of HPSG over movement-based accounts.

But the facts about extraction in coordination structures are more complex than originally assumed, and than (11) allows for. A crucial difference between the Conjunct Constraint and the Element Constraint is that the latter is only in effect if the coordination has a symmetric interpretation (Ross 1967; Goldsmith 1985; Lakoff 1986; Levin & Prince 1986), as in (12).<sup>5</sup>

	- b. Who did Lizzie Borden [[take an ax] and [whack to death]]?
	- c. How much can you [[drink ] and [still stay sober]]?

The coordinate status of (12) has been questioned since Ross (1967). After all, if these are subordinate structures rather than coordinate structures, then the possibility for non-ATB long-distance dependencies ceases to be exceptional. But as Schmerling (1972), Lakoff (1986), Levine (2001) and Kehler (2002: Chapter 5) point out, there is no empirical reason to assume that the examples in (12) are anything other than coordination structures.

Another reason to reject the idea that the SLASH values of the daughters and the mother node are simply equated in ATB extraction is the fact that sometimes

<sup>5</sup> In asymmetric coordination, the order of the conjuncts has a major effect on the interpretation. Thus, *Robin jumped on a horse and rode into the sunset* does not mean the same as *Robin rode into the sunset and jumped on a horse*. Conversely, in symmetric coordination the order of the conjuncts leads to no interpretational differences, as illustrated by the paraphrases *Robin drank a beer and Sue ate a burger* and *Sue ate a burger and Robin drank a beer*.

#### 15 Island phenomena and related matters

multiple gaps are "cumulatively" combined into a "pluralic gap".<sup>6</sup> As an example, consider the extractions in (13). There are two possible interpretations for such extractions: one in which the *ex situ* signs (i.e. the gap signs) and the filler phrase are co-indexed, and therefore co-referential, and a second reading in which the two *ex situ* phrases are not co-indexed even though they are linked to the same filler phrase. Rather, the filler phrase refers to a plural referent composed of the referents of the *ex situ* signs, as indicated by the subscripts in (13). For different speakers, the preferred reading is the former, and in other cases, the latter, often depending on the example.

	- b. [Which city]{*,* } did Jack travel to \_ and Sally decide to live in \_ ? (answer: 'Jack traveled to London and Sally decided to live in Rome.')
	- c. [Who]{*,* } did the pictures of \_ impress \_ the most? (answer: 'Robin's pictures impressed Sam the most.')
	- d. [Who]{*,* } did the rivals of \_ shoot \_ ? (answer: 'Robin's rivals shot Sam.')
	- e. [Who]{*,* } did you send nude photos of \_ to \_ ? (answer: 'I sent photos of Sam to Robin.')

In conclusion, the non-ATB exceptions in (12) suggest that the coordination rule should not constrain SLASH at all, as argued for in Chaves (2003). Rather, the Element Constraint, its ATB exceptions in (10a,b) and the asymmetric non-ATB exceptions in (12) are more likely to be the consequence of an independent semantic-pragmatic constraint that requires the filler phrase to be "topical" relative to the clause (Lakoff 1986; Kuno 1987; Kehler 2002; Kubota & Lee 2015). Thus, if the coordination is symmetric, then the topicality requirement distributes over each conjunct, to require that the filler phrase be topical in each conjunct. Consequently, extraction must be ATB in symmetric coordination. No distribution needs to take place in asymmetric coordination, and thus both ATB and non-ATB extraction is licit in asymmetric coordination. For an attempt to transfer some of Kuno's and Kehlers' insights into HPSG see Chaves (2003). In the latter proposal, the coordination rule is like most other rules in the grammar in that it says nothing about the SLASH values of the mother and the daughters, along the lines of Levine & Hukari (2006: 354). In other words, the constraints on SLASH in (11) are

<sup>6</sup>See for example Munn (1998; 1999), Postal (1998: 136, 160), Kehler (2002: 125), Gawron & Kehler (2003), Zhang (2007), Chaves (2012a), and Vicente (2016).

#### Rui P. Chaves

unnecessary. Rather, pragmatics is the driving force behind how long-distance dependencies propagate one or more conjuncts, depending on the coordination being interpreted symmetrically or not.

Let us take stock. The CSC does not receive a unitary account in modern HPSG, given that the Conjunct Constraint and the Element Constraint are of a very different nature. Whereas the former does not admit ATB extraction, and is predicted by a traceless analysis, the latter allows ATB extraction as seen by the contrast between (4c) and (10). Upon closer inspection, the Element Constraint and the ATB exceptions are semantic-pragmatic in nature. As we shall see, a similar conclusion is plausible for various other island phenomena.

# **4 Complex NP Constraint**

The Complex NP Constraint concerns the difficulty in extracting out of complex NPs formed with either relative clauses (14) or complement phrases (15).

	- b. \* [Which language] did they hire [someone [who speaks \_]]? (cf. with 'Did they hire someone who speaks Arabic?')
	- b. \* What did you believe [the rumor [that Ed disclosed \_]]? (cf. with 'Did you believe the rumor that Ed disclosed *that*?')

It is tempting to prevent extractions out of adnominal clauses by simply stipulating that the SLASH value of the modifier must be empty, as (16) illustrates. Perhaps, along the lines of Fodor (1978; 1983), Berwick & Weinberg (1984), and Hawkins (1999; 2004), processing difficulties lead to the grammaticization of such a constraint, effectively blocking any modified head from hosting any gaps.

(16) Head-Modifier Schema (abbreviated): HEAD-DTR - SYNSEM 1 NON-HEAD-DTRS SYNSEM LOC|CAT|HEAD|MOD 1 NONLOC|SLASH { } 

However, the robustness of the CNPC has been challenged by various counterexamples over the years (Ross 1967: 139; Pollard & Sag 1994: 206–207; Kluender

#### 15 Island phenomena and related matters

1998; Postal 1998: 9; Sag et al. 2009). The sample in (17) involves acceptable extractions from NP-embedded complement CPs (some of which are definite), and (18) involves acceptable extractions from NP-embedded relative clauses.<sup>7</sup>

	- b. [Which rebel] leader would you favor [a proposal [that the CIA assassinate \_]]?<sup>9</sup>
	- c. [Which company] did Simon spread [the rumor [that he had started \_]]?
	- d. [What] did you get [the impression [that the problem really was \_]]?
	- b. Violence is something that there are [many Americans [who condone \_]].<sup>11</sup>
	- c. There were several old rock songs that she and I were [the only two [who knew \_]].<sup>12</sup>
	- d. This is the chapter that we really need to find [someone [who understands \_]].<sup>13</sup>
	- e. Which diamond ring did you say there was [nobody in the world [who could buy \_]]?<sup>14</sup>
	- f. John is the sort of guy that I don't know [a lot of people [who think well of \_]].<sup>15</sup>

<sup>7</sup>Counterexamples to the CNPC can be found in a number of languages, including Japanese and Korean (Kuno 1973; Nishigauchi 1999), Ahan (Saah & Goodluck 1995), Danish (Erteschik-Shir 1973: Chapter 2), Swedish (Allwood 1976; Engdahl 1982), Norwegian (Taraldsen 1982) and Romance languages (Cinque 2010). In some languages that have support verb constructions, the CNPC is apparently not active, which is consistent with a complex predicate analysis for such constructions (Abeillé & Vivès 2021).

<sup>8</sup>Ross (1967: 139)

<sup>9</sup>Pollard & Sag (1994: 206)

<sup>10</sup>Erteschik-Shir & Lappin (1979: 58)

<sup>11</sup>McCawley (1981: 108)

<sup>12</sup>Sag (1997: 454)

<sup>13</sup>Kluender (1992: 238)

<sup>14</sup>Pollard & Sag (1994: 206)

<sup>15</sup>Culicover (1999: 230)

#### Rui P. Chaves

In the above counterexamples, the relative clauses contribute to the main assertion of the utterance, rather than expressing background information. For example, (18a) asserts 'There are many people who like this kind of weather', and so on. Some authors have argued that it is precisely because such relatives express new information that the extraction can escape the embedded clause (Erteschik-Shir & Lappin 1979; Kuno 1987; Deane 1992; Goldberg 2013). If this is correct, then the proper account of CNPC effects is not unlike that of the CSC. In both cases, the information structural status of the clause that contains the gap is crucial to the acceptability of the overall long-distance dependencies.<sup>16</sup>

In addition to pragmatic constraints, Kluender (1992; 1998) proposed that processing factors also influence the acceptability of CNPC violations. Consider for example the acceptability hierarchy in (19); more specific filler phrases increase acceptability, whereas the presence of more specific phrases between the filler and the gap seem to cause increased processing difficulty, and therefore lower the acceptability of the sentence. The symbol '*<*' reads as "is less acceptable than".

	- b. What do you need to find an expert who can translate ? *<*
	- c. What do you need to find someone who can translate ? *<*
	- d. Which document do you need to find an expert who can translate ?

There is on-line sentence processing evidence that CNPC violations with more informative fillers are more acceptable and are processed faster at the gap site than violations with less informative fillers (Hofmeister & Sag 2010), as in (20).

	- b. Which military dictator did you say that nobody in the world could ever depose ?

	- b. John was able to find someone who is willing to learn every Germanic language that we intend to study. (Chaves 2014: 853)

<sup>16</sup>Although it is sometimes claimed that such island effects are also active in logical form and semantic scope (May 1985; Ruys 1993; Fox 2000; Sabbagh 2007; Bachrach & Katzir 2009), there is reason to be skeptical. For example, the universally quantified noun phrases in (i) and (ii) is embedded in a relative clause but can have wide scope over the indefinite *someone*, constituting a semantic CNPC violation. Note that these relatives are not presentational, and therefore are not specially permeable to extraction.

#### 15 Island phenomena and related matters

The same difference in reading times is found in sentences without CPNP violations, in fact. For example, (21b) was found to be read faster at *encouraged* than (21a). Crucially, that critical region of the sentence is not in the path of any filler-gap dependency.

	- b. The diplomat contacted the ruthless military dictator who the activist looking for more contributions encouraged to preserve natural habitats and resources.

Given that finite tensed verbs can be regarded as definite, and infinitival verbs as indefinite (Partee 1984), and given that finiteness can create processing difficulty (Kluender 1992; Gibson 2000), then acceptability clines like (22) are to be expected. See Levine & Hukari (2006: Chapter 5) and Levine (2017: 308) for more discussion.

	- b. Who did you wonder what to say to ? *<*
	- c. Which of the people at the party did you wonder what to say to ?

# **4.1 On D-Linking**

The amelioration caused by more specific (definite) *wh*-phrases as in (19d), (20b) and (22c) has been called a "D-Linking" effect (Pesetsky 1987; 2000). It purportedly arises if the set of possible answers is pre-established or otherwise salient. But there are several problems with the D-Linking story. First, there is currently no non-circular definition of D-Linking; see Pesetsky (2000: 16), Ginzburg & Sag (2000: 247–250), Chung (1994: 33, 39) and Levine & Hukari (2006: 242, 268–271). Second, the counterexamples above in (19d), (20b) and (22c) are given out of the blue, and therefore cannot evoke any preexisting set of referents, as D-Linking requires. Furthermore, nothing should prevent D-Linking with a bare *wh*-item, as Pesetsky himself acknowledges, but on the other hand there is no experimental evidence that context can lead to D-linking of a bare *wh*-phrase (Sprouse 2007; Villata et al. 2016).<sup>17</sup>

Kluender & Kutas (1993), Sag et al. (2009), Hofmeister (2007a,b) and Hofmeister & Sag (2010) argue that more definite *wh*-phrases improve the acceptability of extractions because they resist memory decay better than indefinites, and are

<sup>17</sup>For more detailed criticism of D-Linking see Hofmeister et al. (2007).

#### Rui P. Chaves

compatible with fewer potential gap sites. In addition, Kroch (1998) and Levine & Hukari (2006: 270) point out that D-Linking amelioration effects may simply result from the plausibility of background assumptions associated with the proposition.

# **4.2 On memory limitations**

Sprouse et al. (2012a) use -back and serial recall tasks to argue that there is no evidence that working memory limitations correlate with island acceptability, and therefore that the "processing-based" account of islands put forth by Kluender (1992; 1998), Kluender & Kutas (1993), Hofmeister & Sag (2010) and others is unfounded. To be sure, it cannot be stressed enough that the accounts in Kluender (1992) and Hofmeister & Sag (2010) are not strictly based on performance, and involve other factors as well, most notably plausibility and pragmatic factors. See in particular Hofmeister et al. (2013: 49), where it is argued that at least some extraction constraints may be due to a combination of syntactic, semantic, pragmatic, and performance factors. Basically, if the correct location of a gap is syntactically, semantically, or pragmatically highly unlikely in that particular utterance, then it is less likely for the sentence to be acceptable. Indeed, there is independent experimental evidence that speakers attend to probabilistic information about the syntactic distribution of filler-gap dependencies (van Schijndel et al. 2014), and that gap predictability is crucial for on-line processing of islands (Michel 2014).<sup>18</sup> But as Sprouse et al. (2012b) point out, there is no reason to believe that -back and serial recall tasks are strongly correlated to working memory capacity to begin with. Second, one of the main points of Hofmeister & Sag (2010) is that the literature on experimental island research has not systematically controlled for multiple factors that can impact the processing and comprehension of complex sentences. If the experimental items are excessively complex, then readers are more likely to give up understanding the utterances and subtler effects will not be measurable. Phillips (2013b), however, regard such concerns as irrelevant. Although it is unclear to what extent expectations and processing constraints contribute to island effects, it is likely that they play some role in CNPC effects, as well as other island types discussed below.

<sup>18</sup>More broadly, there is good evidence that speakers deploy probabilistic information when processing a variety of linguistic input, including words (Altmann & Kamide 1999; Arai & Keller 2013; Creel et al. 2008; DeLong et al. 2005; Kutas & Hillyard 1984), lexical categories (Gibson 2006; Levy & Keller 2013; Tabor et al. 1997), syntactic structures (Levy et al. 2012; Lau et al. 2006; Levy 2008; Staub & Clifton, Jr. 2006), semantics (Altmann & Kamide 1999; Federmeier & Kutas 1999; Kamide et al. 2003), and pragmatics (Ni et al. 1996; Mak et al. 2008; Roland et al. 2012).

15 Island phenomena and related matters

# **5 Subjacency**

One general constraint called Subjacency (Chomsky 1973: 271; 1986: 40; Baltin 1981; 2006) was introduced to try to capture many of the constraints discussed here. The claim was that movement cannot cross so-called bounding nodes, and what exactly counts as bounding node was assumed to be a language-specific parameter in a universal principle. Theoretically, this seems questionable, since it requires innate knowledge involving part of speech information (Müller 2020: Section 13.1.5.1; Newmeyer 2004: 539–540), and specific claims concerning German and English are empirically wrong as well, as Müller (2004), Müller (2007), Meurers & Müller (2009) and Strunk & Snider (2013) showed with corpus examples.<sup>19</sup>

The original claim by Baltin (1981) and Chomsky (1986: 40) was that the extraposed relative clauses in (23) can only be interpreted as referring to the embedding NP, that is, an assumed extraposition starts in t<sup>0</sup> rather than t.

	- b. [NP Many proofs [PP of [the theorem t]] t<sup>0</sup> ] appeared [that I wanted to think about].

The authors assume that NP, PP, VP and AP are bounding nodes for rightward movement in English and that the unavailable interpretation is ruled out by the Subjacency Principle (Baltin 1981: 262). However, the attested examples in (24) show that subjacency does not hold for extraposition out of NPs or PPs. The examples in (24a–c) are adapted from Strunk & Snider (2013: 106, 109, 111), and those in (24d–f) are from Chaves (2014: 863).

	- b. We drafted [a list of basic demands ] last night [that have to be unconditionally met or we will go on strike].
	- c. For example, we understand that Ariva buses have won [a number of contracts for routes in London ] recently, [which will not be run by low floor accessible buses].
	- d. Robin bought [a copy of a book ] yesterday [about ancient Egyptian culture].

<sup>19</sup>The observation that arbitrarily many NP nodes can be crossed by extraposition goes back to at least Koster (1978: 52), who discussed Dutch data.

#### Rui P. Chaves


# **6 Right Roof Constraint**

Rightward movement is traditionally regarded as being clause bounded. Such *Right Roof Constraint* (Ross 1967: Section 5.1.2) effects are illustrated in (25), in which a phrase appears *ex situ* in a position to the right of its *in situ* counterpart; see Akmajian (1975), Baltin (1978), and Stowell (1981), among others.

	- b. \* [[That a review \_ came out yesterday] is catastrophic] [of this book] . 21
	- c. \* It was believed \_ that [there walked into the room \_ ] [by everyone] [a man with long blond hair]].<sup>22</sup>

When treated as a form of extraction, rightward movement has been predominantly accounted for via a feature EXTRA(POSED) (Keller 1994; 1995; Bouma 1996; Van Eynde 1996; Müller 1999: Section 13.2; Kim & Sag 2005), rather than by SLASH. Thus, Right-Roof Constraint (RRC) island effects can be easily modeled by stipulating that the EXTRA value of an S node must be empty. One way to do so is to state that any S dependent (valent or adjunct) must be [EXTRA hi]. Thus, no extraposed element may escape its clause. However, the oddness of (25) may not be due to any such syntactic stipulation, given the acceptability of counterexamples like (26). Note that the adverbial interveners in such examples do not require parenthetical prosody. Conversely, even strong parenthetical prosody on the adverbs in (25) fails to improve those sentences.

	- b. I've [been wanting to [meet someone who KNOWS ] [ever since I was little]] [exactly what happened to Amelia Earhart].<sup>24</sup>

<sup>20</sup>Chaves (2014: 861)

<sup>21</sup>Rochemont (1992: 375)

<sup>22</sup>Rochemont (1992: 386)

<sup>23</sup>Adapted from Kayne (1998: 167).

<sup>24</sup>Chaves (2014: 861), adapted from Gazdar (1981: 177)

15 Island phenomena and related matters

c. I've been wondering if it is possible ] [for many years now] [for anyone to memorize the Bible word for word].<sup>25</sup>

The durative semantics of *I've been wanting/requesting/wondering* raises an expectation about the realization of a durative adverbial expression like *ever since* or *for many years* that provides information about the durative semantics of the main predicate. Hence, the adverb is cued by the main predication, in some sense, and coheres much better in a high attachment than with a lower one.

The fact that the RRC is prone to exceptions has been noted by multiple authors as the sample in (27) illustrates. In all such cases, a phrase is right-extracted from an embedded clause, which should be flat out impossible if extraposition is clause-bounded. Again, the adverbial interveners in (27) do not require any special prosody, which means that these data cannot be easily discarded as parenthetical insertions.

	- b. I have [wanted [to meet ] for many years] [the man who spent so much money planning the assassination of Kennedy].<sup>27</sup>
	- c. Sue [kept [regretting ] for years] [that she had not turned him down].<sup>28</sup>
	- d. She has been [requesting that he [return ] [ever since last Tuesday]] [the book that John borrowed from her last year].<sup>29</sup>
	- e. Mary [wanted [to go ] until yesterday] [to the public lecture].<sup>30</sup>

Grosu (1973b), Gazdar (1981) and Stucky (1987) argued that the RRC is the result of performance factors such as syntactic and semantic parsing expectations and memory resource limitations, not grammar proper. Indeed, we now know that there is a general well-known tendency for the language processor to prefer attaching new material to the more recent constituents (Frazier & Clifton, Jr. 1996; Gibson et al. 1996; Traxler et al. 1998; Fodor 2002; Fernández 2003). Indeed, eye-tracking studies like Staub et al. (2006) indicate that the parser is reluctant to adopt extraposition parses. This explains why extraposition in written texts is

<sup>25</sup>Chaves (2014: 861)

<sup>26</sup>Attributed to Witten (1972) in Postal (1974: 92n).

<sup>27</sup>Attributed to Janet Fodor (p.c.) in Gazdar (1981: 177).

<sup>28</sup>Van Eynde (1996: )

<sup>29</sup>Adapted from Kayne (1998: 167).

<sup>30</sup>Lasnik (2009)

#### Rui P. Chaves

less common in proportion to the length of the intervening material (Uszkoreit et al. 1998): the longer the structure, the bigger the processing burden. Crucially, however, the preference for the closest attachment can be weakened by many factors (Fernández 2003; Desmet et al. 2006; De Vicenzi & Job 1993; Carreiras 1992). For example, Levy et al. (2012) show that relative clause extraposition creates significant processing difficulty when compared with non-extraposed counterparts of the same sentences, but that a preceding context that sets up a strong expectation for a relative clause modifying a given noun can facilitate comprehension of an extraposed relative clause modifying that noun. In other words, in spite of a larger processing burden, some extrapositions can be made easier to process by parsing expectations.

A detailed account of extraposition island phenomena does not exist in any framework, as far as I am aware. But the line of inquiry first proposed by Grosu (1973b), Gazdar (1981) and Stucky (1987), and later experimentally supported by Levy et al. (2012), Strunk & Snider (2013) seems to be on the right track. If so, then there is no syntactic constraint on EXTRA. Rather, RCC effects are to a large extent the result of difficulty in integrating the extraposed phrase *in situ*.

# **7 Freezing**

A related island phenomenon also involving rightward displacement, first noted in Ross (1967: 305), is *Freezing*: leftward extraction (28a) and extraposition (28b) cause low acceptability when they interact, as seen in (29). In (29a) there is extraction from an extraposed PP, in (29b) there is extraction from an extraposed NP, and in (29c) an extraction from a PP crossed with direct object extraposition.

	- b. \* Who did you [give \_ [to Robin] [a picture of \_]]?
	- c. \* Who did you [give \_ [to \_] [a picture of my brother]]?

Fodor (1978: 457) notes that (29c) has a syntactically highly probable temporary alternative parse in which *to* combines with the NP *a picture of my brother*. The existence of this local ambiguity likely disrupts parsing, especially as it occurs in a portion of the sentence that contains two gaps in close succession. Indeed, constructions with two independent gaps in close proximity are licit, but not

15 Island phenomena and related matters

trivial to process, as seen in (30), specially if the extraction paths cross (Fodor 1978), as in (30b).

	- b. ? Who can't you remember which papers you sent copies of \_ to \_ ?

A similar analysis is offered by Hofmeister et al. (2015: 477), who note that constructions like (29c) must cause increased processing effort since the point of retrieval and integration coincides with the point of reanalysis. The existence of a preferential alternative parse that is locally licit but globally illicit can in turn lead to a "digging-in" effect (Ferreira & Henderson 1991; 1993; Tabor & Hutchins 2004), in which the more committed the parser becomes to a syntactic parse, the harder it is to backtrack and reanalyze the input. The net effect of these factors is that the correct parse of (29c) is less probable and therefore harder to identify than that of (29b), which suffers from none of these problems, and is regarded to be more acceptable than (29c) by Fodor (1978: 453) and others. See Chaves (2018) for experimental evidence that speakers can adapt and to some extent overcome some of these parsing biases.

Finally, prosodic and pragmatic factors are likely also at play in (29), as in the RRC. Huck & Na (1990) show that when an unstressed stranded preposition is separated from its selecting head by another phrase, oddness ensues for prosodic reasons. Finally, Huck & Na (1990) and Bolinger (1992) also argue that freezing effects are in part due to a pragmatic conflict created by extraposition and extraction: *wh*-movement has extracted a phrase leftward, focusing interest on that expression, while at the same time extraposition has moved a constituent rightward, focusing interest on that constituent as well. Objects tend to be extraposed when they are discourse new, and even more so when they are heavy (Wasow 2002: 71). Therefore, the theme phrase *a picture of John* in (29c) is strongly biased to be discourse new, but this clashes with the fact that an entirely different entity, the recipient, is leftward extracted, and therefore is the *de facto* new information that the open proposition is about. No such mismatch exists in (29a) or (29b), in contrast, where the extraposed theme is more directly linked to the entity targeted by leftward extraction.

# **8 Subject islands**

Extraction out of subject phrases like (31) is broadly regarded to be impossible in several languages, including English (Chomsky 1973: 106), an effect referred

#### Rui P. Chaves

to as a *Subject Island* (SI). This constraint is much less severe in languages like Japanese, German, and Spanish, among others (Stepanov 2007; Jurka et al. 2011; Goodall 2011; Sprouse et al. 2015; Fukuda et al. 2018; Polinsky et al. 2013).

	- b. \* Who was a picture of laying there?<sup>32</sup>
	- c. \* Who do you think pictures of would please John?<sup>33</sup>
	- d. \* Who does the claim that Mary likes upset Bill?<sup>34</sup>
	- e. \* Which candidate were there posters of all over town?<sup>35</sup>

However, English exceptions were noticed early on, and have since accumulated in the literature. In fact, for Ross (1967), English extractions like (32a) are not illicit, and more recently Chomsky (2008: 147) has added more such counterexamples. Other authors noted that certain extractions from subject phrases are naturally attested, as in (32b,c). Indeed, Abeillé et al. (2018) shows that extractions like those in (32c) are in fact acceptable to native speakers, and that no such island effect exists in French either.<sup>36</sup>

	- b. They have eight children [of whom] [five \_] are still living at home.<sup>38</sup>
	- c. Already Agassiz had become interested in the rich stores of the extinct fishes of Europe, especially those of Glarus in Switzerland and of Monte Bolca near Verona, [of which] , at that time, [only a few \_] had been critically studied.<sup>39</sup>

English exceptions to the SI constraint are not restricted to PP extractions, however. Although Ross (1967: 265) claimed NP extractions from NP subjects like (33) are illicit, it was arguably premature to generalize from such a small sample.

<sup>31</sup>Chomsky (1977: 106)

<sup>32</sup>Kayne (1981: 114)

<sup>33</sup>Huang (1982: 497)

<sup>34</sup>Lasnik & Saito (1992: 42)

<sup>35</sup>Lasnik & Park (2003: 651)

<sup>36</sup>For completeness, other authors argue that PP extractions from NP subjects are illicit, such as Lasnik & Park (2003: 653), among many others.

<sup>37</sup>Ross (1967: 242)

<sup>38</sup>Huddleston et al. (2002: 1093)

<sup>39</sup>Encyclopaedia Britannica Online, Agassiz, (Jean) Louis (Rodolphe). Quoted from Santorini (2007)

15 Island phenomena and related matters

	- b. \* Which cars were the hoods of damaged by the explosion?

Indeed, a number of authors have noted that some NP extractions from subject NPs are either passable or fairly acceptable, as illustrated in (34). See also Pollard & Sag (1994: 195, ft. 32), Postal (1998), Sauerland & Elbourne (2002: 304), Culicover (1999: 230), Levine & Hukari (2006: 265), Chaves (2012b: 470, 471), and Chaves & Dery (2014: 97).

	- b. It's [the kind of policy statement] that [jokes about \_] are a dime a dozen.<sup>41</sup>
	- c. There are [certain topics] that [jokes about \_] are completely unacceptable.<sup>42</sup>
	- d. [Which car] did [some pictures of \_] cause a scandal?<sup>43</sup>
	- e. [What] did [the attempt to find \_] end in failure?<sup>44</sup>
	- f. [Which president] would [the impeachment of \_] cause more outrage?<sup>45</sup>
	- g. I have a question that [the probability of you knowing the answer to \_] is zero.<sup>46</sup>

Whereas SI violations involving subject CPs are not attested, those involving infinitival VP subjects like (35) are. See Chaves (2012b: 471) for more natural occurrences.

	- b. In his bedroom, [which] [to describe \_ as small] would be a gross understatement, he has an audio studio setup.<sup>48</sup>

<sup>40</sup>Kluender (1998: 268)

<sup>41</sup>Levine et al. (2001: 204)

<sup>42</sup>Levine & Sag (2003: 252, ft. 6)

<sup>43</sup>Jiménez–Fernández (2009: 111)

<sup>44</sup>Hofmeister & Sag (2010: 370)

<sup>45</sup>Chaves (2012b: 467)

<sup>46</sup>Chaves (2013: 305)

<sup>47</sup>Huddleston et al. (2002: 1094, ft. 27)

<sup>48</sup>http://pipl.com/directory/name/Frohwein/Kym, accessed 2021-04-03, quoted from Chaves (2013: 303)

#### Rui P. Chaves

c. They amounted to near twenty thousand pounds, [which] [to pay \_] would have ruined me.<sup>49</sup>

Incidentally, subject phrases are not extraposition islands either, as shown in (36) from Chaves (2014: 864). See also Guéron & May (1984). The oddness of examples like \*[*Pictures* \_] *frighten people* [*of John*] from Drummond (2011: 46) is more likely due to a digging-in effect, caused by speakers assuming that the subject is syntactically and semantically complete by the end of the verb phrase.

	- b. [A copy of a new book \_] arrived yesterday [about ancient Egyptian culture] .

# **8.1 Clausal Subject Constraint**

Let us now consider SI effects involving more complex subjects. Infinitival subject clauses seem to impose no SI constraint, an observation going back to Kuno & Takami (1993), but noted elsewhere a few times:

	- b. I just met Terry's eager-beaver research assistant [who] for us to talk to \_ about any subject other than linguistics – would be absolutely pointless.<sup>51</sup>
	- c. There are [people in this world] that for me to describe \_ as despicable – would be an understatement.<sup>52</sup>

Infinitival subjects contrast dramatically with finite subjects. The latter are renowned for being particularly hard to extract from, as shown in (38). Ross (1967: 243) dubbed this extreme kind of SI the *Sentential Subject Constraint* (SSC). See also Chomsky (1973), Huang (1982), Chomsky (1986), and Freidin (1992). 53

<sup>49</sup>Benjamin Franklin, William Temple Franklin and William Duane. 1834. Memoirs of Benjamin Franklin, vol 1. p. 58

<sup>50</sup>Kuno & Takami (1993: 49)

<sup>51</sup>Levine & Hukari (2006: 265)

<sup>52</sup>Chaves (2012b: 468)

<sup>53</sup>That said, Chaves (2013) reports that some native speakers find SSC violations like (i) to be fairly acceptable, again raising some doubt about the robustness of English SI effects:

<sup>(</sup>i) [Which actress] does [whether Tom Cruise marries \_] make any difference to you?

15 Island phenomena and related matters

	- b. \* [Who] did [that Robin married \_] surprise you? (cf. with 'Did that Robin married Sam surprise you?')

There are some functional reasons for why clausal SI violations may be so strong. First, subject clauses are notorious for being particularly difficult to process, independent of extraction. Clausal subjects are often stylistically marked and difficult to process, as (39a) illustrates. Thus, it is extremely hard to embed a clausal subject within another clausal subject, even though such constructions ought to be perfectly grammatical, like (39b, c). In addition, it is known that tense can induce greater processing costs (Kluender 1992; Gibson 2000).

	- b. \* That that Joe left bothered Susan surprised Max.<sup>55</sup>
	- c. \* That that the world is round is obvious is dubious.<sup>56</sup>

Interestingly, clausal subjects become more acceptable if extraposed as shown in (40). The explanation offered by Fodor et al. (1974: 356–357) is that speakers tend to take the initial clause in the sentence to be the main clause. Thus, *that* is taken to be the subject, but the remainder of the structure does not fit this pattern. Thus, a sentence like (40a) causes increased processing load because it has a different structure than the parser expects. This processing problem does not arise in the counterpart in (40b).<sup>57</sup>

	- b. It is dubious that [it is obvious that [the world is round]].<sup>58</sup>

<sup>54</sup>Gibson (1991: 57)

<sup>55</sup>Kimball (1973: 33)

<sup>56</sup>Kuno (1974: 119)

<sup>57</sup>See Gibson (2006) for online evidence that the word *that* is preferentially interpreted as a determiner even in syntactic contexts where it cannot be a determiner. The use of "determiner" corresponds to the traditional term, referring to a certain category of prenominal constituent rather than to the whole nominal phrase including the noun and all its dependents. Gibson's evidence suggests that both top-down (syntactic) expectations are independent from bottom-up (lexical) frequency-based expectations in sentence processing. Thus, a clausal subject phrase starting with the complementizer *that* is likely to be misparsed as a matrix clause with sentence-initial pronominal or determiner *that*.

<sup>58</sup>Kuno (1974: 130)

#### Rui P. Chaves

Indeed, Fodor & Garrett (1967), Bever (1970), and Frazier & Rayner (1988) also show that extraposed clausal subject sentences like (41a) are easier to process than their *in-situ* counterparts like (41b). Not surprisingly, the former are much more frequent than the latter, which explains why the parser would expect the former more than the latter.

	- b. That Mary was happy surprised Max.

If we add a filler-gap dependency to a sentence that already is complex by virtue of having a clausal subject, the resulting structure may be too difficult to parse. This point is illustrated by the contrast in (42).

	- b. What does his coming prove?<sup>59</sup>

As argued by Davies & Dubinsky (2009), the low acceptability of extraction in subject-auxiliary inversion sentences with clausal subjects is more likely to be the result of extragrammatical factors than of grammatical conditions. For example, not all extractions like (43b) are unacceptable, as Delahunty (1983: 382–387) and Davies & Dubinsky (2009: 115) point out.

	- b. \* Who did that the food that John ordered tasted good please ?

The evidence discussed so far suggests that sentences involving extraction and clausal subjects are odd at least in part due to the likely cumulative effect of various sources of processing complexity. Sentences with sentential subjects are unusual structures, which can mislead the parser into the wrong analysis. A breakdown in comprehension can occur because the parser must hold complex incomplete phrases in memory while processing the remainder of the sentence. The presence of a filler-gap dependency will likely only make the sentence harder to process. It is independently known that the more committed the parser becomes to a syntactic parse, the harder it is to reanalyze the string (Ferreira & Henderson 1991; 1993; Tabor & Hutchins 2004). For example, unless prosodic or contextual cues are employed to boost the activation of the correct parse, (44) will be preferentially misanalyzed as having the structure [NP [V [NP]]].

(44) Fat people eat accumulates.

<sup>59</sup>Lewis (1993: 146)

#### 15 Island phenomena and related matters

The garden-path effect that the digging-in causes in example (44) serves as an analogy for what may be happening in particularly difficult subject island violations. In both cases, the sentences have exactly one grammatical analysis, but that parse is preempted by a highly preferential alternative which ultimately cannot yield a complete analysis of the sentence. Thus, without prosodic cues indicating the extraction site, sentences like (45) induce a significant digging-in effect as well.

	- b. ? Which disease will a cure for be found by you?

This also explains why SI violations like (46) are relatively acceptable: a subject NP with a subordinate CP is more expectable and easier to process than a CP subject, even though the former is more complex than the latter.<sup>60</sup> Clausal subjects are unusual structures, inconsistent with the parser expectations (Fodor et al. 1974), and the presence of filler-gap dependency in an NP-embedded clausal subject is less likely to cause difficulty for the parse to go awry than a filler-gap dependency in a clausal subject.<sup>61</sup>

	- b. [Which crime] did the fact that nobody was accused of \_ astonish you the most?
	- c. [Which question] did the fact that none of us could answer \_ surprise you the most?
	- d. [Which joke] did the fact that nobody laughed at \_ surprise you the most?

<sup>60</sup>For claims that NP-embedded clausal SI violations are illicit see Lasnik & Saito (1992: 42), Phillips (2006: 796), and Phillips (2013a: 67).

<sup>61</sup>Clausen (2010; 2011) provide experimental evidence that complex subjects cause a measurable increase in processing load, with and without extraction. Moreover, it is known that elderly adults have far more difficulty repeating sentences with complex subjects than sentences with complex objects (Kemper 1986). Similar difficulty is found in timed reading comprehension tasks (Kynette & Kemper 1986), and in disfluencies in non-elderly adults (Clark & Wasow 1998). Speech initiation times for sentences with complex subjects are also known to be longer than for sentences with simple subjects (Ferreira 1991; Tsiamtsiouris & Cairns 2009). Finally, Garnsey (1985), Kutas et al. (1988), and Petten & Kutas (1991)show that the processing of open-class words, particularly at the beginning of sentences, require greater processing effort than closedclass words.

#### Rui P. Chaves

# **8.2 Accounts of SI effects**

This complex array of effects suggests that the SI constraint is not due to a single factor (Chomsky 2008; Chaves 2013; Jiménez–Fernández 2009), be it grammatical or otherwise. One possibility is that SIs are partly due to pragmatic and processing constraints, perhaps not too different from those that appear to be active in the island effects discussed so far. As Kluender (2004: 495) notes: "Subject Island effects seem to be weaker when the *wh*-phrase maintains a pragmatic association not only with the gap, but also with the main clause predicate, such that the fillergap dependency into the subject position is construed as of some relevance to the main assertion of the sentence". Indeed, many authors (Erteschik-Shir 1981; Van Valin 1986; Kuno 1987; Takami 1992; Deane 1992; Goldberg 2013) have argued that extraction is in general restricted to the informational focus of the proposition, and that SIs (among others) are predicted as a consequence. In a nutshell, since subjects are typically reserved for topic continuity, subject-embedded referents are unlikely to be the informational focus of the utterance. Although it is not easy to construct sentences where a dependent of the subject can be easily deemed as the informational focus, it is by no means impossible. For instance, (47a) is particularly acceptable because whether or not an impeachment causes outrage crucially depends on who is impeached (cf. with *Would the impeachment of Donald Trump cause outrage?*). Similarly, in (47b) whether or not an attempt failed or succeeded crucially depends on what was attempted (cf. with *The attempt to find the culprit ended in failure*).

	- b. What did [the attempt to find ] end in failure?<sup>63</sup>

Although experimental research has confirmed that sentences with SI violations tend to be less acceptable than grammatical controls (Sprouse 2009; Goodall 2011; Crawford 2012; Clausen 2011; Sprouse et al. 2015), and that their acceptability remains consistently low during repeated exposition (Sprouse 2009; Crawford 2012), other research has found that the acceptability of SI violations is not consistently low, and can be made to increase significantly (Hiramatsu 2000; Clausen 2011; Chaves 2012a; Chaves & Dery 2014). This mixed evidence is consistent with the idea that SI effects are very sensitive to the particular syntax, semantics, and pragmatics of the utterance in which they occur. If the items are too complex, or stylistically awkward, or presuppose unusual contexts, then SI effects

<sup>62</sup>Chaves (2012b: 467)

<sup>63</sup>Hofmeister & Sag (2010: 370)

#### 15 Island phenomena and related matters

are strong. For example, if the extraction is difficult to process because the sentence gives rise to local garden-path and digging-in effects, and is pragmatically infelicitous in the sense that the extracted element is not particularly relevant for the proposition (i.e. unlikely to be what the proposition is about) or comes from the presupposition rather than the assertion, then we obtain a very strong SI effect. Otherwise, the SI effect is weaker, and in some cases nearly non-existent like (47), (35), or the pied-piping examples studied by Abeillé et al. (2018). The latter involve relative clauses, in which subjects are not strongly required to be topics, in contrast to the subjects of main clauses.

This approach also explains why subject-embedded gaps often become more acceptable in the presence of a second non-island gap: since the two gaps are coindexed, then the fronted referent is trivially relevant for the main assertion, as it is a semantic argument of the main verb. For example, the low acceptability of (48a) is arguably caused by the lack of plausibility of the described proposition: without further contextual information, it is unclear how the attempt to repair an unspecified thing is connected to that attempt damaging a car.<sup>64</sup>

(48) a. \* What did [the attempt to repair ] ultimately damage the car? b. What did [the attempt to repair ] ultimately damage ?

The example in (48a) becomes more acceptable if it is contextually established that is a component of the car. In contrast, (48b) is felicitous even out-of-theblue because it conveys a proposition that is readily recognized as being plausible according to world knowledge: attempting to fix can cause damage to . If Subject Island effects are indeed contingent on how relevant the extracted subject-embedded referent is for the assertion expressed by the proposition, then a wide range of acceptable patterns is to be expected, parasitic or otherwise. This includes cases like (49), where both gaps are in SI environments. As Levine & Sag (2003), Levine & Hukari (2006: 256) and Culicover (2013: 161) note, cases like (49) should be completely unacceptable, contrary to fact.

(49) This is a man who [friends of ] think that [enemies of ] are everywhere.

The conclusion that SI effects are contingent on the particular proposition expressed by the utterance and its pragmatics thus seems unavoidable (Chaves & Dery 2014). In order to test this hypothesis, Chaves & Dery (2014) examine the acceptability of sentences like (50), which crucially express nearly-identical

<sup>64</sup>The examples in (48) are due to Phillips (2006: 796).

#### Rui P. Chaves

truth conditions and have equally acceptable declarative counterparts. This way, any source of acceptability contrast must come from the extraction itself, not from the felicity of the proposition.

(50) a. Which country does the King of Spain resemble [the President of ]? b. Which country does [the President of ] resemble the King of Spain?

The results indicate that although the acceptability of the SI counterpart in (50b) is initially significantly lower than (50a), it gradually improves. After eight exposures, the acceptability of near-truth-conditionally-equivalent sentences like (50) becomes non-statistically different. What this suggests is that SI effects are at least in part probabilistic: the semantic, syntactic and pragmatic likelihood of a subject-embedded gap likely matters for how acceptable such extractions are. This is most consistent with the claim that – in general – extracted phrases must correspond to the informational focus of the utterance (Erteschik-Shir 1981; Van Valin 1986; Kuno 1987; Takami 1992; Deane 1992; Goldberg 2013), and in particular with the intuition that SI violations are weaker when the extracted referent is relevant for the main predication (Kluender 2004: 495).

# **9 Adjunct islands**

Cattell (1976) and Huang (1982) noted that adjunct phrases often resist extraction, as illustrated in (51). The constraint blocking respective extractions is usually referred to as *The Adjunct Island Constraint* (AIC).

	- b. \* What did John arrive while [whistling \_]?<sup>66</sup>
	- c. \* Which club did John meet a lot of girls [without going to \_]?<sup>67</sup>
	- d. \* Who did Robin laugh [after Pat called on the phone \_]?<sup>68</sup>

Although a constraint on SLASH could effectively ban all extraction from adjuncts, the problem is that the AIC has a long history of exceptions, noted as early as Cattell (1976: 38), and by many others since, including Chomsky (1982: 72), Engdahl (1983), Hegarty (1990: 103), Cinque (1990: 139), Pollard & Sag (1994:

<sup>65</sup>Truswell (2007: 1357)

<sup>66</sup>Truswell (2007: 1359)

<sup>67</sup>Cattell (1976: 38)

<sup>68</sup>based on Huang (1982: 503)

#### 15 Island phenomena and related matters

191), Culicover (1997: 253), and Borgonovo & Neeleman (2000: 200). A sample of representative counterexamples is provided in (52).

	- b. Who would you rather sing [with ]?
	- c. What temperature should I wash my jeans [at ]?
	- d. That's the symphony that Schubert died [without finishing ].
	- e. Which report did Kim go to lunch [without reading ]?
	- f. A problem this important, I could never go home [without solving first].
	- g. What did he fall asleep [complaining about ]?
	- h. What did John drive Mary crazy [trying to fix ]?
	- i. Who did you go to Girona [in order to meet ]?
	- j. Who did you go to Harvard [in order to work with ]?

Exceptions to the AIC include tensed adjuncts, as noted by Grosu (1981: 88), Deane (1991: 29), Levine & Hukari (2006: 287), Goldberg (2006: 144), Chaves (2012b: 471), Truswell (2011: 175, ft. 1) and others. A sample is provided in (53).<sup>69</sup>

	- b. This is the house that Mary died [before she could sell ].
	- c. The person who I would kill myself [if I couldn't marry ] is Jane.
	- d. Which book will Kim understand linguistics better [if she reads ]?
	- e. This is the watch that I got upset [when I lost ].
	- f. Robin, Pat and Terry were the people who I lounged around at home all day [without realizing were coming for dinner].
	- g. Which email account would you be in trouble [if someone broke into ]?
	- h. Which celebrity did you say that [[the sooner we take a picture of ], [the quicker we can go home]]?

<sup>69</sup>Truswell (2011) argues that the AIC and its exceptions are best characterized in terms of eventsemantic constraints, such that the adjunct must occupy an event position in the argument structure of the main clause verb. However, recent experimental research has been unable to validate Truswell's acceptability predictions (Kohrt et al. 2019), and moreover, such an account incorrectly predicts that extractions from tensed adjuncts is impossible (Truswell 2011: 175, ft. 1).

#### Rui P. Chaves

To be sure, some of these sentences are complex and difficult to process, which in turn can lead speakers to prefer the insertion of an "intrusive" resumptive pronoun at the gap site, but they are certainly more acceptable than the classic tensed AIC violations examples like Huang's (51d). Acceptable tensed AIC violations are more frequent in languages like Japanese, Korean, and Malayalam.

Like Subject Islands, AIC violations sometimes improve "parasitically" in the presence of a second gap as in (54). First of all, note that these sentences express radically different propositions, and so there is no reason to assume that all of these are equally felicitous. Second, note that (54a, c) describe plausible states of affairs in which it is clear what the extracted referent has to do with the main predication and assertion, simply because of the fact that *document* is a semantic argument of *read*. In contrast, (54b) describes an unusual state of affairs in that it is unclear what the extracted referent has to do with the main predication *read the email*, out of the blue. Basically, what does reading emails have to do with filing documents?

	- b. \* Which document did John read the email before filing ?
	- c. Which document did John read before filing a complaint?

If AIC violations were truly only salvageable parasitically, then counterexamples like (55a) should not exist. As Levine & Sag (2003) and Levine & Hukari (2006: 256) note, both gaps reside in island environments and should be completely out and less acceptable than (55b, c), contrary to fact.

	- b. \* What kinds of books do [authors of ] argue about royalties after writing malicious pamphlets?
	- c. \* What kinds of books do authors of malicious pamphlets argue about royalties [after writing ]?

In (55a), there is no sense in which the gap inside the subject is parasitic on the gap inside the adjunct, or vice-versa – under the assumption that neither gaps is supposed to be licit without the presence of a gap outside an island environment. In conclusion, the notion of parasitic gap is rather dubious. See Levine & Hukari (2006: 256–273) for a more in-depth discussion of parasitism and empirical criticism of null resumptive pronoun accounts.

15 Island phenomena and related matters

As in the case of other island phenomena discussed so far, it is doubtful that any purely syntactic account can describe all the empirical facts. Rather, extractions out of adjuncts are licit to the degree that the extracted referent can be interpreted as being relevant for the assertion.

# **10 Superiority effects**

Contrasts like those below have traditionally been taken to be due to a constraint that prevents a given phrase from being extracted if another phrase in a higher position can be extracted instead (Chomsky 1973; 1980). Thus, the highest *wh*phrase is extractable, but the lowest is not.


(57) a. Who did you persuade to buy what? b. \* What did you persuade who to buy ?

Several different kinds of exceptions to this *Superiority Constraint* (SC) have been noted in the literature. First, it is generally recognized that *which*-phrases are immune to the SC:

(58) a. I wonder which book which of our students read over the summer? b. Which book did which professor buy ?

Pesetsky (1987) proposed to explain the lack of SC effects in (58) by stipulating that *which*-phrases are interpreted as indefinites which do not undergo LF movement. Rather, they require "D-linking" and obtain wide scope via an entirely different semantic mechanism called unselective binding. In order for a phrase to be D-linked, it must be associated with a salient set of referents. But as Ginzburg & Sag (2000: 248) note, there is no independent evidence for saliency interpretational differences between *which* and other *wh*-phrases like *what* and *who*. For example, it is implausible that speakers have a specific referent in mind for the *which*-phrases in examples like (59).<sup>70</sup>

(59) a. I don't know anything about cars. Do you have any suggestions about which car – if any – I should buy when I get a raise?

<sup>70</sup>The examples in (59) are from Ginzburg & Sag (2000: 248).

#### Rui P. Chaves

b. I don't know anything about cars. Do you have any suggestions about what – if anything – I should buy when I get a raise?

Furthermore, there are acceptable SC violations involving multiple *wh*-questions such as those in (60). See Bolinger (1978), Kayne (1983) and Pesetsky (1987: 109) for more such examples and discussion.<sup>71</sup>

	- b. What did WHO take WHERE?
	- c. Where did WHO take WHAT ?

Finally, there are also SC violations that involve echo questions like (61) and reference questions like (62). See Ginzburg & Sag (2000: Chapter 7) for a detailed argumentation that echo questions are not fundamentally different, syntactically or semantically, from other uses of interrogatives.

	- b. What did WHO break ?
	- b. What did WHO break ?

There are two different, yet mutually consistent, possible explanations for SC effects in HPSG circles. One potential factor concerns processing difficulty (Arnon et al. 2007). Basically, long-distance dependencies where a *which*-phrase is fronted are generally more acceptable and faster to process than those where *what* or *who* if fronted, presumably because the latter are semantically less informative, and thus decay from memory faster, and are compatible with more potential gap sites before the actual gap. The second potential factor is prosodic in nature. Drawing from insights by Ladd (1996: 170–172) about the English interrogative intonation, Ginzburg & Sag (2000: 251) propose that in a multiple *wh*-interrogative construction, all *wh*-phrases must be in focus except the first. Crucially, focus is typically – but not always – associated with clearly discernible pitch accent. Thus, (56) and (57) are odd because the second *wh*-word is unaccented. In this account, a word like *who* has two possible lexical descriptions, shown in (63).

<sup>71</sup>Fedorenko & Gibson (2010) and others have found no evidence that the presence of a third *wh*phrase improves the acceptability of a multiple interrogative, even with supporting contexts. However, the examples in (60) require peculiar intonation, which may be difficult to elicit with written stimuli.

#### 15 Island phenomena and related matters

Since only the (optionally accented) lexical entry in (63a) is specified with a non-empty WH value, the theory of extraction proposed in Ginzburg & Sag (2000: Chapter 5) predicts that (63a) must appear *ex situ*. In contrast, the accented lexical entry in (63b) can appear*in situ*. For more discussion see Levine & Hukari (2006: 261).

A related range of island phenomena concerns extraction from *whether*-clauses, which is traditionally assumed to be forbidden, as (64) illustrates.

#### Rui P. Chaves

	- b. \* Which movie did John ask why Mary liked ?

But again, the oddness of (64) is unlikely to be due to syntactic constraints, given the existence of passable counterexamples like (65).

	- b. Which glass of wine do you wonder whether I poisoned ? 73
	- c. Who is John wondering whether or not he should fire ? 74

As noted by Kroch (1998: 28), the reduced acceptability of an example like (66a) is better explained simply by noting the difficulty of accommodating its presupposition in (66b).

	- b. There was a sum of money about which John was wondering whether to pay it.

# **11 The Left Branch Condition**

Ross (1967: 207) discovered that the leftmost constituent of an NP cannot be extracted, as in (67), a constraint he dubbed the *Left Branch Condition* (LBC).<sup>75</sup>

	- b. \* Which did you buy [ \_ book]? (cf. with '*You bought which book?*')
	- c. \* How much did you find [ \_ money]? (cf. with '*You found how much money?*')

<sup>72</sup>Ross (1967: 27)

<sup>73</sup>Cresti (1995: 81)

<sup>74</sup>Chaves (2012b: 477)

<sup>75</sup>As in other island environments discussed above, the LBC is not operative in constraining semantic scope, as illustrated in the following example from Copestake et al. (2005: 303):

<sup>(</sup>i) Someone took a picture of each student's bicycle.

#### 15 Island phenomena and related matters

These facts are accounted for if Determiner Phrases (DPs) are not valents of the nominal head. See also Van Eynde (2021: Section 2.3.2), Chapter 8 of this volume. If the DP is not listed in the argument structure of the nominal head, then there is no way for the DP to appear in SLASH. See Runner et al. (2006) for psycholinguistic evidence that reflexive binding to possessors involves binding-theory-exempt logophors, since reflexives in the PPs of NPs containing possessors are not in complementary distribution with pronouns. Rather, the DP selects the nominal head as shown in (68).

Based on Sag (2012: 133), Chaves & Putnam (2020: 197, 198) assume that genitive DPs combine with nominal heads and bind their X-ARG index via a dedicated construction, not as valents. For example, in nominalizations like *Kim's description of the problem* the DP *Kim's* is not a valent of *description*, and therefore the genitive DP cannot appear in SLASH. Rather, genitive DPs are instead constructionally co-indexed with the agent role of the noun *description* via X-ARG. Moreover, the clitic *s* in *Kim's* must lean phonologically on the NP it selects, and therefore cannot be stranded for independently motivated phonological reasons, predicting the oddness of \**It was Kim who I read 's description of the problem*.

There are various languages which do not permit extraction of left branches from noun phrases, but have a particular PP construction that appears to allow LBC violations. This is illustrated below in (69), with French data.

#### Rui P. Chaves

	- b. \* Quels how.many avez-vous have-you acheté bought [ \_ livres]? books 'How many books have you bought?'

But the LBC violation in (69a) is only apparent. The *de livres* is in fact a postverbal *de*-N<sup>0</sup> nominal, and thus no LBC violation occurs in (69a). See Abeillé et al. (2004) for details. Finally, Ross (1967: 236–237) also noted that some languages do not obey the LBC at all. A small sample is given in (70). However, the languages in question lack determiners, and therefore it is possible that the extracted phrase has a similar independent status to the French *de-N*<sup>0</sup> phrase in (69a).

(French)


c. Ki-nek who-DAT akarod, you.want hogy that halljam I.hear [ \_ a the hang-já-t]? voice-POSS.3SG-ACC (Hungarian) 'Whose voice do you want me to hear?'

# **12 The Complementizer Constraint**

Perlmutter (1968) noted that subject phrases have different extraction properties than that of object phrases, as illustrated in (71). The presence of the complementizer hampers extraction of the subject, but not of the complement.<sup>76</sup>

(71) a. \* [Who] did Tom say (?that) \_ had bought the tickets? b. \* [Who] do you believe (?that) \_ got you fired?

(i) a. Some teacher claimed that each student had cheated.

<sup>76</sup>There is no evidence that the Complementizer Constraint applies at the semantic level, however. The subject phrase of the embedded clause can outscope the subject phrase of the matrix:

b. Every teacher claimed that a student had cheated.

15 Island phenomena and related matters


Bresnan (1977: 194), Culicover (1993) and others also noted that Complementizer Constraint effects can be reduced in the presence of an adverbial intervening between the complementizer and the gap:

	- b. [Who] do you think that after years and years of cheating death \_ finally died?

In Bouma et al. (2001) and Ginzburg & Sag (2000: 181), extracted arguments are typed as *gap-synsem* rather than *canon-synsem*. Only the latter are allowed to correspond to *in situ* signs and to reside in valence lists. However, subject extraction is different. If a subject phrase is extracted, then the SUBJ list contains the respective *gap-synsem* sign. If one assumes that the lexical entry for the complementizer *that* requires S complements specified as [SUBJ h i] then the oddness of (71) follows. For Ginzburg & Sag (2000: 181), the adverbial circumvention effect in (72) is the result of assuming that the complementizer selects an adverb and a clause as arguments (the second of which is required to have a subject gap. This analysis seems *ad-hoc* because the adverb would be expected to adjoin to the clause rather than being a complement of the complementizer. On the other hand, the analysis is consistent with what happens in French: when the subject of the complement CP is extracted, the complementizer must be *qui* instead of *que*, which could easily be captured by such an account.

A simpler account of the Complementizer Constraint has emerged recently, however, in principle compatible with any theory of grammar. For Kandybowicz (2006; 2009) and others, the Complementizer Constraint is prosodic in nature. Complementizers must cliticize to the following phonological unit, but if a pause is made at the gap site then the complementizer cannot do so. Accordingly, if the pronunciation of *that* is produced with a reduced vowel [ðət] rather than [ðæt] then the Complementizer Constraint violations in (71) improve in acceptability. Though promising, Ritchart et al. (2016) found no experimental evidence for amelioration of the Complementizer Constraint effects either with phonological reduction of the complementizer or with contrastive focus. Further research is needed to determine the true nature of Complementizer Constraint effects.

#### Rui P. Chaves

# **13 Island circumvention via ellipsis**

Ellipsis somehow renders island constraints inactive, as in (73). A deletion-based analysis of such phenomena such as Merchant (2001) relies on moving the *wh*phrase before deletion takes place, but since movement is assumed to be sensitive to syntactic island constraints, the prediction is that (73) should be illicit, contrary to fact.

	- b. Bo talked to the person who discovered something, but I still don't know what (\*Bo talked to the person who discovered \_ ). [CNPC violation]
	- c. That he'll hire someone is possible, but I won't divulge who (\*that he'll hire \_ is possible). [SSC violation]
	- d. She bought a rather expensive car, but I can't remember how expensive (\*she bought a car). [LBC violation]

The account adopted in HPSG is one in which remnants are assigned an interpretation based on the surrounding discourse context (Ginzburg & Sag 2000; Culicover & Jackendoff 2005; Jacobson 2008; Sag & Nykiel 2011). See Nykiel & Kim (2021), Chapter 19 of this volume for more detailed discussion. In a nutshell, the *wh*-phrases in (73) are "coerced" into a proposition-denoting clause via a unary branching construction that taps into contextual information. This straightforwardly explains not only why the antecedent for the elided phrase need not correspond to overt discourse – e.g. sluices like *What floor?* or *What else?* – but also why the examples in (73) are immune to island constraints: there simply is no island environment to begin with, and thus, no extraction to violate it. For more on ellipsis and island effects see Chaves & Putnam (2020: 108–109).

# **14 Conclusion**

HPSG remains relatively agnostic about many island types, given the existence of robust exceptions. It is however clear that many island effects are not purely due

#### 15 Island phenomena and related matters

to syntactic constraints, and are more likely the result of multiple factors, including pragmatics, semantics and processing difficulty. To be sure, it is yet unclear how these factors can be brought together and articulate an explicit and testable account of island effects. In particular, it is unclear how to combine probabilistic information with syntactic, semantic and pragmatic representations, although one fruitful avenue to approach this problem may be via *Data-Oriented Parsing* (Neumann & Flickinger 2002; Neumann & Flickinger 1999; Arnold & Linardaki 2007; Bod et al. 2003; Bod 2009).

From its inception, HPSG has been meant to be compatible with models of language comprehension and production (Sag 1992; Sag & Wasow 2011; 2015), but not much work has been dedicated to bridging these worlds; see Wasow (2021), Chapter 24 of this volume. The challenge that island effects posit to any theory of grammar is central to linguistic theory and cognitive science: how to integrate theoretical linguistics and psycholinguistic models of on-line language processing so that fine-grained predictions about variability in acceptability judgments across nearly isomorphic clauses can be explained.

# **Acknowledgments**

Many thanks to Bob Borsley, Berthold Crysmann, Anne Abeillé, Jean-Pierre Koenig, and Stefan Müller for detailed comments about an earlier draft, and to Stefan Müller for invaluable editorial assistance.

# **References**


#### Rui P. Chaves


15 Island phenomena and related matters


#### Rui P. Chaves


#### 15 Island phenomena and related matters

erative Grammar 130), 403–429. Berlin: De Gruyter Mouton. DOI: 10 . 1515 / 9781501504266-013.


#### Rui P. Chaves


#### 15 Island phenomena and related matters


#### Rui P. Chaves


15 Island phenomena and related matters


#### Rui P. Chaves


15 Island phenomena and related matters

Papers in Linguistics 13), 101–124. Cambridge, MA: MIT Working Papers in Linguistics.


#### Rui P. Chaves


15 Island phenomena and related matters

*the European Chapter of the Association for Computational Linguistics*, 301–306. Dublin: Association for Computational Linguistics.


#### Rui P. Chaves


15 Island phenomena and related matters


#### Rui P. Chaves


#### 15 Island phenomena and related matters

solving ambiguities. *Language & Cognitive Processes* 11(3). 283–334. DOI: doi. org/10.1080/016909696387196.


#### Rui P. Chaves


#### 15 Island phenomena and related matters


#### Rui P. Chaves

*Formal and explicit models of grammar: A guide to current models*, 359–377. Oxford: Wiley-Blackwell. DOI: 9781444395037.ch10.


15 Island phenomena and related matters


#### Rui P. Chaves

*Journal of Speech, Language, and Hearing Research* 52(6). 1623–1639. DOI: 10. 1044/1092-4388(2009/08-0063).


15 Island phenomena and related matters

*handbook* (Empirically Oriented Theoretical Morphology and Syntax), 1081– 1104. Berlin: Language Science Press. DOI: 10.5281/zenodo.5599866.


# **Chapter 16**

# **Coordination**

# Anne Abeillé

Université de Paris

# Rui P. Chaves

University at Buffalo, SUNY

Coordination is a central topic in theoretical linguistics. Following GPSG, which provided the first formal analysis of unlike coordination, HPSG has developed detailed analyses of different coordination constructions in a variety of unrelated languages. Central to the HPSG analyses are two main ideas: (i) coordination structures are non-headed phrases, and (ii) coordinate daughters display some kind of parallelism, which is captured by feature sharing. From these ideas, specific properties can be derived, regarding extraction and agreement, for instance. Many HPSG analyses also agree that coordination is a cover term for a wide variety of different constructions which can be viewed as different subtypes of coordinate phrases, and which can be cross-classified with other subtypes of the grammar (nominal or not, with ellipsis or not, etc.). We present the description of various coordination phenomena and show that HPSG can account for their subtle properties, while integrating them into the general organization of the grammar.

# **1 Introduction**

In this chapter we refer to expressions like *and*, *either*, *or*, *but*, *let alone*, etc. as *coordinators* and the phrases that a coordinator can combine with as *coordinands*. Thus, in "A or B", both A and B are coordinands and *or* is the coordinator. A great deal of research has been dedicated to the topic of coordination structures in the last 70 years, spanning a multitude of different approaches in many different theoretical frameworks. With regard to the linguistic problems, research

Anne Abeillé & Rui P. Chaves. 2021. Coordination. In Stefan Müller, Anne Abeillé, Robert D. Borsley & Jean- Pierre Koenig (eds.), *Head-Driven Phrase Structure Grammar: The handbook*, 725–776. Berlin: Language Science Press. DOI: 10.5281/zenodo.5599848

#### Anne Abeillé & Rui P. Chaves

questions abound. In the realm of syntax there is much debate concerning the role of coordination lexemes, the existence of null coordinators, the syntactic relationship between coordinands, the peculiar extraction phenomena that certain coordination structures exhibit, the necessary properties that allow two different structures to be coordinated, the relation between coordination structures and comparative and subordination structures, peculiar ellipsis phenomena that can optionally occur, the various patterns of agreement that obtain in nominal coordination structures, the distribution and syntactic realization of the lexemes *either* and *or*, etc. In the realm of semantics, the issues are no less complex, and the debate no less lively. There are many questions pertaining to how exactly the meaning of coordination structures is construed.

Among the first attempts to offer a precise formalization of the syntax and semantics of coordination was the seminal work of Gazdar (1980). Other seminal work soon followed, including the demonstration that phrase structure grammar offered a way to model filler-gap dependencies and certain island constraints (Gazdar 1981). In particular, Gazdar's account showed how long-distance dependencies involving multiple gaps linked to the same filler phrase could be modeled straightforwardly, something that mainstream movement-based models still struggle with to this day. Finally, there were also in-depth examinations of a number of complex empirical phenomena in Gazdar et al. (1982), which proved highly influential in the genesis of Generalized Phrase Structure Grammar, and later, of HPSG. Coordination thus has a special place in the history of HPSG, and still figures in many theoretical arguments within Generative Grammar, given the extremely challenging phenomena it poses for linguistic theory. Nevertheless, there is no clear consensus, even within HPSG, about how to analyze coordination. For example, in some accounts the coordinator expression is a weak head, whereas in others it is a marker. Coordinate structures are binary branching in some accounts, but not so in others. Finally, in some accounts, non-constituent coordination involves some form of deletion, but in others, no deletion operation is assumed. In this chapter we survey the empirical arguments and formal accounts of coordination, with special focus on its morphosyntax.

# **2 Headedness**

The head of a construction is traditionally defined as the constituent which determines the syntactic distribution and the meaning of the whole, and it is also often the case that a dependent can be omitted, fronted, or extraposed while the head cannot be (Zwicky 1985). In coordination constructions, something very

### 16 Coordination

different occurs. First, the syntactic category and the distribution of a coordinate phrase is collectively determined by the coordinands, not by one particular coordinand nor by the coordination particle. Thus, an S coordination yields an S, a VP coordination yields a VP, and so on, for virtually all categories.<sup>1</sup> This is perhaps clearer in cases like (1), where expressions such as *simultaneously*, *both*, and *together* can be used to show that the entire bracketed string is interpreted as a complex unit denoting a plurality.

	- b. Often [[Kim goes to the beach]<sup>S</sup> and [Sue goes to the city]S]S.
	- c. Sue [[read the instructions]VP and [dried her hair]VP]VP in twenty seconds.
	- d. You can't simultaneously [[drive a car]VP and [talk on the phone]VP]VP.
	- e. Simultaneously [[shocked]VP and [saddened]VP]VP, Robin decided to go home.
	- f. Robin is both [[tall]<sup>A</sup> and [thin]A]A.
	- g. [[Tom]NP and [Mia]NP]NP agreed to jump into the water together.

Generally, a coordinate structure has the same grammatical function and category as the coordinands: given a number of coordinands of category X, the distribution of the coordinate constituent that is obtained is again the same as of an X constituent, what Pullum & Zwicky (1986: 752) refer to as Wasow's Generalization. In particular, this is what allows coordination to apply recursively:

	- b. I can either [[sing and dance]VP or [sing and play the guitar]VP]VP.
	- c. Either [[John went to Paris and Kim went to Brussels]<sup>S</sup> or [none of them ever left home]S]S.

Another piece of evidence in favor of a non-headed analysis comes from the fact that there is no typological correlation between the position of the coordinator and the head directionality (Zwart 2005). For example, in Zwart's survey of 136 languages where half are verb-final and half verb-initial, verb-final languages overwhelmingly employ coordinator-initial strategies. In particular, 119 of these languages have exclusively coordinator-initial, 12 exhibit both coordinator-initial

<sup>1</sup>The exceptions include coordinator expressions themselves, e.g. \**You ordered a coffee and or or a tea?* This oddness may be due to the coordinands being of the wrong semantic type. See Section 5 for more on lexical coordination.

#### Anne Abeillé & Rui P. Chaves

and coordinator-final strategies, and only 4 have exclusively coordinator-final structures.

Finally, coordination is also special in that the relationship between coordinands is unlike adjunction (Levine 2001: 156–160). Whereas adjuncts can in principle be displaced, coordinands do not have any mobility, as (3) illustrates.

	- b. \* And Jane likes music, Tom learned to play the piano.

Thus, no coordinand can usually be said to be a dependent. For example, reversing the order of the coordinands in (4) causes no major change in meaning. Neither daughter can be said to be the head because no subordination dependency is established between coordinands.

	- b. Robin ordered a pizza and Sam ordered a burger.

To be sure, there are certain coordination structures like those in (5) which do not have such symmetric interpretations (Goldsmith 1985; Lakoff 1986; Levin & Prince 1986). Regardless, such constructions retain many of the properties that characterize coordinate structures, and therefore are likely to be coordinate just the same (Kehler 2002: Chapter 5).

	- b. Robin rode into the sunset and jumped on a horse.

For these reasons, HPSG adopts a rather traditional non-headed analysis of coordination, an approach going back to Bloomfield (1933: 195) and Ross (1967: Section 4.2), and later adopted in many other frameworks such as Pesetsky (1982: Section 3.1), Gazdar (1980: 407), and Huddleston et al. (2002: 1275), among many others. See Borsley (1994; 2005) and Chaves (2007: Chapter 2) for more discussion about previous claims in the literature that coordination structures are headed. Finally, we note that the HPSG account is in agreement with Chomsky (1965: 196), who argued against postulating complex syntactic representations without direct empirical evidence:<sup>2</sup>

It has sometimes been claimed that the traditional coordinate structures are necessarily right-recursive (Yngve 1960) or left-recursive (Harman, 1963,

<sup>2</sup> In more recent times, Chomskyan theorizing has assumed that all structures should be binary branching purely on conceptual economy grounds; see Johnson & Lappin (1999) for criticism.

### 16 Coordination

p. 613, rule 3i). These conclusions seem to me equally unacceptable. Thus to assume (with Harman) that the phrase "a tall, young, handsome, intelligent man" has the structure [[[[tall young] handsome] intelligent] man] seems to me no more justifiable than to assume that it has the structure [tall [young [handsome [intelligent man]]]]. In fact, there is no grammatical motivation for any internal structure […]. The burden of proof rests on one who claims additional structure beyond this. (Chomsky 1965: 196–197)

As we shall see, the empirical evidence suggests that the simplest and most parsimonious structure for coordination is neither left- nor right-recursive.

# **3 On the syntax of coordinate structures**

There is a wide range of coordination strategies in the languages of the world (Haspelmath 2007). In some languages, no coordinand is accompanied by any coordinator (asyndeton coordination, as in *We came, we saw, we conquered*), or one of the coordinands is accompanied by a coordinator (monosyndeton coordination, as in *We came, we saw, and we conquered*). Other strategies involve marking multiple coordinands with a coordinator (polysyndeton coordination; *We came, and we saw, and we conquered*), or all coordinands (omnisyndeton coordination; *Either you come or you go*). All of these are schematically depicted in (6); see Drellishak & Bender (2005) for more discussion about how to accommodate such typological patterns in a computational HPSG platform.


Finally, a single coordination strategy often serves to coordinate all types of constituent phrases, but in many languages, different coordination strategies only cover a subset of the types of phrases in the language. For example, in Japanese the clitic *to* is used for nominal coordination and *te* is used for other coordinations.

In what follows, we start by focusing on monosyndeton coordination. There are three possible structures one can assign to such coordinations, as Figure 1 illustrates. The binary branching approach (left) goes back to Yngve (1960: 456), and is used in HPSG work such as Pollard & Sag (1994: 200–205), Yatabe (2003),

#### Anne Abeillé & Rui P. Chaves

Crysmann (2008), Beavers & Sag (2004), Drellishak & Bender (2005), Chaves (2007), and Chaves (2012b), among others. The flat branching approach (center) has also been assumed in HPSG (Abeillé 2005; Abeillé, Bonami, et al. 2006; Mouret 2005; 2006; Bîlbîie 2017), and the totally flat approach (right) much less frequently (Sag et al. 2003; Sag 2003).<sup>3</sup>

Figure 1: Three possible headless analyses of coordination

The binary branching analysis requires two different rules, informally depicted in (7), and a special feature to prevent the coordinator from recursively applying to the last coordinand, e.g. \**Robin and and and Kim*. Otherwise, the two rules are unremarkable and are handled by the grammar like any other immediate dominance schema. See, for example, Beavers & Sag (2004) for a formalization.

(7) a. X<sup>+</sup> → *Coord* X<sup>−</sup> b. X → X<sup>−</sup> X<sup>+</sup>

Kayne (1994: Chapter 6) and Johannessen (1998: Chapter 3) argue that coordination follows X-bar theory and that the coordinator is the head of the construction; see Borsley & Müller (2021: Section 4.2.2), Chapter 28 of this volume. But in HPSG, even though one of the coordinands (or more) may combine with a coordinator, this subconstituent is not the head of the construction, which is considered as unheaded. The two analyses are contrasted in Figure 2.

Similarly, the flat branching analysis where the coordinator and the coordinand attach to each other requires two rules as well (where ≥ 1):

(8) a. X<sup>+</sup> → *Coord* X<sup>−</sup> b. X → X 1 − … X − X<sup>+</sup>

<sup>3</sup>See Borsley (2005) for criticism of ConjP and of the binary branching analysis of coordinate structures with three coordinands. ConjP is also discussed in Borsley & Müller (2021: Section 4.2.2), Chapter 28 of this volume.

16 Coordination

Figure 2: Binary-branching analyses of coordination, headed and non-headed

However, the flat analysis requires only one rule, and no special features at all, as (9) illustrates.

(9) X → X <sup>1</sup> … X *Coord* X+<sup>1</sup>

That said, there are some reasons for assuming that the coordinator does in fact combine with the coordinand, as in (8a). First, in some languages of the world, the coordinator is a bound morpheme instead of a free morpheme. For example, verbs are coordinated by adding one of a set of suffixes to one of the coordinands in Abelam (Papua New Guinea), usually the first one in a coordination of two items. Similarly, in Kanuri (Nilo-Saharan), verb phrases are coordinated by marking the first verb with a conjunctive form affix, and in languages like Telugu (Dravidian), the coordination of proper names is marked by the lengthening of their final vowels (Drellishak & Bender 2005: 111). This last example is illustrated in (10), quoted from Drellishak & Bender (2005: 111).

(10) kamalaa Kamala wimalaa Vimala poDugu tall 'Kamala and Vimala are tall.'

Second, as Ross (1967: 165) originally noted, the natural intonation break occurs before the coordination lexeme, rather than between the coordinator and the coordinand, so that a prosodic constituent is formed. Although prosodic phrasing is not generally believed to always align with syntactic phrasing, the fact that the coordinator prosodifies with the coordinand suggests that the former forms a unit with the latter.

Aspects of the phrase structure rule in (8b) can be formalized in HPSG as shown in (11), using parametric lists (Pollard & Sag 1994: 396, fn. 2) to enforce that all coordinands structure-share the morphosyntactic information. The type *ne-list* (*non-empty-list*) corresponds to a list that has at least one member, and

(Telugu)

#### Anne Abeillé & Rui P. Chaves

when used parametrically as in (11), it additionally requires that every member of the list bear the features - SYNSEM|LOC|CAT 1 .

(11) *coord-phrase* " ⇒ SYNSEM|CAT 1 DTRS -SYNSEM|LOC|CAT 1 <sup>⊕</sup> *ne-list* - SYNSEM|LOC|CAT 1 #

The constraint forcing all daughters to be of the same category is excessive, as we shall see below, and this will have to undergo a revision. Later in the chapter, we will see further proposals. For now, we are focusing on standard coordinations.

In order to account for the fact that different kinds of coordination strategies are possible, Mouret (2006: 260) and Bîlbîie (2017: 205) define three subtypes of *coord-phrase*, assuming a lexical feature COORD to distinguish between coordination types:<sup>4</sup>

$$\begin{array}{l} \text{(12)} \quad \text{a. } \mathit{simple-coord-phrase} \Rightarrow\\ \begin{bmatrix} \mathtt{prras } ne\text{-list} \left( \left\{ \left[ \text{coon} \,\text{non} \,\text{} \,\text{one} \right] \right\} \right) \oplus ne\text{-list} \left( \left[ \text{coon} \,\text{} \,\text{} \,\Box \,\text{crd} \right] \right) \\\text{b. } \mathit{comisy} \,\text{detic-coord-phrase} \Rightarrow\\ \begin{bmatrix} \mathtt{prras } ne\text{-list} \left( \left[ \text{coon} \,\text{} \,\Box \,\text{crd} \right] \right) \end{bmatrix} \end{array}$$

c. *asyndetic-coord-phrase* h ⇒ DTRS *ne-list* - COORD *none* i

Here, we assume that the value of COORD must be typed as *coord*, and that the latter has various sub-types as shown in Figure 3. Thus, simple (monosyndeton and polysyndeton) coordinations are those where all but the first coordinand are allowed to combine with a coordinator, omnisyndeton coordinations are those where all coordinands have combined with a coordinator, and likewise, asyndeton coordinations are those where none of the coordinands have combined with a coordinator.

<sup>4</sup>Mouret's and Bı̂lbı̂ie's formulations are slightly different in that the relevant feature is instead called CONJ, and a slightly different type hierarchy is assumed, with negative constraints like CONJ ≠ *nil* being employed instead of COORD *crd*. The current formulation avoids negative constraints, though nothing much hinges on this. Similar liberty is taken in subsequent constraints, for exposition purposes.

Strictly speaking tags that appear only once in a structure are illegitimate, since tags are about sharing values. The purpose of the tags in (12a) and (12b) is to ensure that all members in the list have the same COORD value. A more precise way would add a constraint to (12a) and (12b) saying that 1 = >, > (top) being the most general type in the type hierarchy. While this does not really add restrictive constraints on 1 , it makes sure that all list members of the second list get the same COORD value, since all elements of the lists are [COORD 1 ] and since they are all shared with the 1 mentioned in 1 = >.

16 Coordination

Figure 3: Coordinator sub-types

We turn to the analysis of coordinators. In other words, what exactly are words like *and*, *or*, and others, and how do they combine with coordinands?

# **3.1 The status of coordinator expressions**

In HPSG, coordinators are sometimes analyzed as markers (Beavers & Sag 2004: Section 4.1; Drellishak & Bender 2005: Section 4.1). In such a view, the coordinator's lexical entry does not select any arguments, since it has none. In (13), we show the lexical entry for the conjunction, using current HPSG feature geometry. Note that the MRKG (marking) value of the coordinator is the same as the coordinand's, which makes this marker a bit unusual in that it is transparent. Thus, if *and* coordinates S nodes that are MRKG *that* (i.e. CPs in the analysis of Pollard & Sag 1994: Section 1.6), then the result will be an S that is also MRKG *that*, and so on, for any given value of MRKG. 5

This sign imposes constraints on the head sign it combines with via the feature SEL(ECTION), the same feature that allows other markers and adjuncts in general

<sup>5</sup>The semantics and pragmatics of coordination is a particularly complex topic which we cannot do justice to here, especially when it comes to interactions with other phenomena such as quantifier scope and collective, distributive, and reciprocal readings. See Koenig & Richter (2021), Chapter 22 of this volume for more discussion and in particular Copestake et al. (2005: Section 6.7), Fast (2005), Chaves (2007: Chapters 4–6; 2012b: Section 5.3; 2012a; 2009), and Park (2019: Chapters 4–5) for HPSG work that specifically focuses on the semantics of coordination.

#### Anne Abeillé & Rui P. Chaves

to combine with their hosts. The syntactic construction that allows such elements with their selected heads is the Head-Functor Construction in (14). Since the second daughter is the head, the value of the mother's HEAD feature will have to be the same as the head daughter's, as per the Head Feature Principle.<sup>6</sup>

$$\begin{array}{l} \text{(14)} \quad head\text{-}functor\text{-}phase \Rightarrow\\ \begin{bmatrix} \text{F}\text{SUB} & \box{\text{D}} \\ \text{SYNSSEM}|\text{LOC}|\text{CAT} & \begin{bmatrix} \text{COMP} & \box{\text{D}} \\ \text{COMPS} & \box{\text{D}} \\ \text{COMPS} & \box{\text{D}} \end{bmatrix} \\\\ \begin{array}{l} \text{HD-DTR} \begin{bmatrix} \text{S} \\ \text{SYNSSEM}|\text{L}|\text{CAT} & \boxed{\text{COOD}} \begin{bmatrix} \text{S} \\ \text{COMPS} \end{bmatrix} \end{bmatrix}, \begin{bmatrix} \text{S} \\ \text{S} \end{bmatrix} \text{SYNSSEM} \begin{bmatrix} \text{L} \end{bmatrix} \text{L} \begin{bmatrix} \text{SUBJ} & \boxed{\text{D}} \\ \text{COMPS} & \boxed{\text{D}} \end{bmatrix} \end{array} \end{array}$$

Thus, the coordinator projects an NP when combined with an NP, an AP when combined with an AP, etc., as Figure 4 illustrates.

Figure 4: Coordinate marking constructions

An alternative HPSG account that yields almost the same representation through different means is adopted by Abeillé (2003; 2005), Mouret (2007), Bîlbîie (2017), and others. This approach takes coordinators to be *weak heads*, i.e. heads which inherit most of their syntactic properties from their complement, like argument-marking prepositions do. Thus, the coordinator combines with coordinands via the same headed constructions that license non-coordinate structures. It preserves the MRKG feature when coordinands are themselves marked. The coordinator takes the adjacent coordinand as a complement. This captures its being first in head-initial languages like English, and its final position in headfinal languages like Japanese.

<sup>6</sup>The Head Feature Principle (Pollard & Sag 1994: 34) states that the value of the mother's HEAD feature is identical to that of the head daughter's HEAD feature. See also Abeillé & Borsley (2021: 22), Chapter 1 of this volume.

16 Coordination

(15) a. Lee [and Kim] (English) b. Lee=to Lee=and Kim Kim 'Lee and Kim'

(Japanese)

Since it is a weak head, it inherits most of its syntactic features (HEAD, MRKG) from its complement, and adds its own COORD feature. The lexical entry for the coordinator *and* is shown in (16).

The weak head analysis is illustrated in Figure 5. Here, the category of the coordinator, the coordinand, and the mother node are the same, because the coordinator's head value is lexically required to be structure-shared with the head value of the coordinand it combines with (which is its first complement; see Section 5 on lexical coordination to see why the coordinator may inherit some complements expected by the coordinand).

Figure 5: Coordinate weak-head constructions

Before moving on, we note that the weak head analysis of coordinators makes certain problematic predictions that the marker analysis in (13) does not make. Since coordinands are selected as arguments in the former approach, additional assumptions need to be made in order to prevent the extraction of coordinands as

Anne Abeillé & Rui P. Chaves

in (17). If coordinands are arguments and hence listed in valence lists like COMPS and ARG-ST, then they are expected to be extractable (see Borsley & Crysmann 2021: Section 4, Chapter 13 of this volume and Chaves 2021: 668, Chapter 15 of this volume).

(17) \* Which boy did you compare Robin and \_? (cf. with *which boy did you compare Robin with* \_*?*)

For this reason, the members of ARG-ST of the coordinator are typed as *canonical* by Abeillé (2003: 17) to prevent their extraction, analogously to how prepositions in most languages must prevent their complements from being extracted, unlike English and a few other languages. See Abeillé, Bonami, et al. (2006: Section 3.2) for a weak head analysis of certain French prepositions.

# **3.2 Correlative coordination**

Having discussed monosyndeton coordination structures, we now move on to cases where multiple interdependent coordinators are present, such as correlative *either … or …*, *neither … nor …*, and *both … and …*. See Hofmeister (2010) for an account in HPSG. Given the linearization flexibility of the first coordinator, it can be analyzed in English as an adverbial rather than as a true coordinator:

	- b. Fred either bought a cooking book or he bought a gardening magazine.
	- c. Fred can either buy a cooking book or he can buy a gardening magazine.
	- b. John will both read the introduction and the conclusion.

In French, as in other Romance languages, the coordinator itself can be reduplicated, and it is obligatory for some coordinators (*soit* 'or' in French) (Mouret 2005; Bîlbîie 2017: 205–206):

(20) a. Jean Jean lira read.FUT et and l'introduction the.introduction et and la the conclusion. conclusion 'Jean read both the introduction and the conclusion.'

16 Coordination


Thus, there are different structures for different types of correlative, as Figure 6 illustrates. The one on the left is for correlatives that exhibit adverbial properties and the one on the right is for correlatives that do not. See Bîlbîie (2008: 33–36) for arguments that both types are attested in Romanian.

Figure 6: Two possible structures for correlative coordination

The correlative coordinate structure on the right is covered by (12b), since it requires the COORD feature to be the same for all coordinands.

# **3.3 Comparative correlatives**

When there is no overt coordinator, it is not always clear whether a binary clause construction is coordinate or not. Comparative correlatives such as (21) have been analyzed as coordinate by Culicover & Jackendoff (1999) for English (in syntax, though not in semantics) and as universally subordinate by den Dikken (2006).

(21) The more I read, the more I understand.

#### Anne Abeillé & Rui P. Chaves

On the semantic side, the interpretation is something like: 'if I read more, I understand more'. Abeillé (2006) and Abeillé & Borsley (2008) propose that they are coordinate in some languages and subordinate in others. In English, one can add the adverb *then*, whereas in French, one can add the coordinand *et* ('and'). In English, the first clause can also be used as a standard adjunct (22).

	- b. Plus more je I lis read (et) and plus more je I comprends. understand (French) 'If I read more, I understand more.'
	- c. I understand more, the more I read.

As shown by Culicover & Jackendoff (1999: 549–550), the second clause shows matrix clause properties, not the first one:

	- b. \* The more we eat, the angrier you get, don't we?

Syntactic parallelism seems to be stricter in French; for example, clitic inversion or extraction must take place out of both clauses at the same time (Abeillé & Borsley 2008: 1152):

(24) a. Paul Paul a has peu little de of temps: time aussi so plus more vite fast commencera-t-il, start.FUT-he plus more vite fast aura-t-il AUX.FUT-he fini. finish.PTCP (French) 'Paul has little time left: so the faster he starts, the faster he will finish.'

b. C' this est is un a livre book que COMP plus more tu you lis, read.2SG plus more tu you apprécies. appreciate.2SG 'This is a book that the more you read the more you like.'

In Spanish, comparative correlatives come in two varieties as the following examples by Abeillé, Borsley & Espinal (2006: 7) show: one that can be analyzed as subordinate as in (25a), and one that can be analyzed as coordinate, as in (25b).

(25) a. Cuanto how.much más more leo, read.1SG (tanto) that.much más more entiendo. understand.1SG (Spanish) 'The more I read, the more I understand.'

16 Coordination

b. Más more leo read.1SG (y) and más more entiendo. understand.1SG 'The more I read, the more I understand.'

Be they coordinate or subordinate, comparative correlatives are special kinds of construction: they are binary, with a fixed order (the meaning changes if the order is reversed as in (26a)). The internal structure of each clause is also special. In English, it must start with *the* and a comparative phrase, as the oddness of (26b) shows, and may involve a long distance dependency (26c). Each clause must be finite and allow for copula omission, as shown in (26d).

	- b. \* I understand (the) more, I read (the) more.
	- c. The more I manage to read, the more I start to understand.
	- d. The more intelligent the students, the better the marks.

These *the*-clauses are a special subtype of finite clause, starting with a comparative phrase. Abeillé, Borsley & Espinal (2006: 19) and Borsley (2011: 14) define a CORREL feature which is a LEFT EDGE feature (see the EDGE feature in Bonami et al. 2004 for French liaison). Assuming a degree word *the*, which can only appear as a specifier of a comparative word, Borsley (2011: 13) defines the *the*-clause as a subtype of *head-filler-phrase* with [CORREL *the*]; see also Sag (2010: 527).

Comparative correlatives belong to a more general class of (binary) correlative constructions, including *as … so …*, and *if … then …* constructions (Borsley 2004: Section 3.2; 2011: 17–18).<sup>7</sup> Correlative constructions can be defined as follows, where *correl-phrase* is a sub-type of *declarative-clause* and the feature CORREL introduces a *correl* type hierarchy analogous to that of *coord* in Figure 3 above. The construction in (27) thus states that all correlative constructions have in common the fact that both daughters are marked by a special expression.

```
(27) correl-phrase ⇒

       SYNSEM|LOC|CAT|CORREL none
       DTRS *-

              SYNSEM|LOC|CAT|CORREL corr-mrk
                                                ,
              -

              SYNSEM|LOC|CAT|CORREL corr-mrk
                                                +
```
Naturally, *correl-construction* has various sub-types, each imposing particular patterns of correlative marking, including coordinate correlatives. More specifically, this family of constructions comes in two varieties: asymmetric (for the

<sup>7</sup>This does not handle Hindi type correlatives, which differ in that only the first clause is introduced by a correlative word, and the first clause is mobile and optional; see Pollard & Sag (1994: 228) for an analysis.

subordinate ones, like English comparative correlatives), and symmetric (for coordinate ones, like French comparative correlatives). The symmetric subtype inherits from *clausal-coordination-phrase*, while the asymmetric one inherits from the *head-adjunct-phrase*, as seen in Figure 7.

Figure 7: Type hierarchy for correlative constructions

Thus, asymmetric English comparative correlatives can be defined as in (28), where *the* is a sub-type of *corr-mrk* (i.e. is a correlative marker).

```
(28) asymmetric-correl-phrase

                               ⇒
       HD-DTR 1
       DTRS 
              -
              SYNSEM|LOC|CAT|CORREL the
                                          , 1
                                             -

                                              SYNSEM|LOC|CAT|CORREL the
```
Similarly, symmetric French comparative correlatives can be defined as in (29), where both clauses are coordinated (the second one may be introduced by *et* or without a conjunction) and introduced by a comparative correlative marker (*plus* 'more', *moins* 'less', *mieux* 'better').

$$\begin{aligned} \text{(29)} \quad &\begin{cases} \text{symmetric-correl-phrase} \Rightarrow \\ \text{prxs} \left( \begin{bmatrix} \text{symsum}[\text{Loc}[\text{corr}] \text{cornRE} \; \text{compar}] \end{bmatrix}, \begin{bmatrix} \text{symsum}[\text{Loc}[\text{ccar} \; \text{con} \; \text{cer} \; \text{et} \; \text{et}] \end{bmatrix} \right) \end{aligned} \end{aligned}$$

A more complete analysis would take into account the semantics as well (Sag 2010: Section 5.5). From a syntactic point of view, HPSG seems to be in a good position to handle both the general properties and the idiosyncrasy of the comparative correlative construction, as well as its crosslinguistic variation. For an analysis of a number of Arabic correlative constructions see Alqurashi & Borsley (2014). See also Borsley (2011) for a comparison with a tentative Minimalist analysis.

16 Coordination

# **4 Phrasal coordination and feature resolution**

# **4.1 Feature sharing between coordinands**

The coordination construction in (11) requires the value of CAT to be structureshared across the coordinands and the mother node. Given the large number of features within CAT, such a constraint makes a series of predictions and mispredictions. For example, this entails that all valence constraints are identical. Thus, in VP coordination, all nodes have an empty COMPS list and share exactly the same singleton SUBJ list, as illustrated in Figure 8. Thus, nothing needs to be said from the semantic composition side: the verbs will have to share exactly the same referent for their subject. The same goes for any other combination of categories of whatever part of speech.

Figure 8: Valence identity in coordination

All the unsaturated valence arguments become one and the same for all coordinands, and it becomes impossible to have daughters with different subcategorization information. For example, if one daughter requires a complement while the other does not, CAT identity is impossible. This correctly rules out a coordination of VP and V categories like the one in (30a), or S and VP as in (30b):

	- b. \* Fred [she has a hat]SUBJ hi and [smiled]SUBJ <sup>h</sup> NP <sup>i</sup> .

But there is other information in CAT besides valence. For example, the head feature VFORM encodes the verb form, and the coordination of inconsistent VFORM

#### Anne Abeillé & Rui P. Chaves

values is ruled out as ungrammatical as seen in (31), while consistent values of VFORM are accepted as illustrated by (32).<sup>8</sup>

	- b. \* Sue [buy something]VFORM *inf* and [came home]VFORM *fin*.
	- b. Sue [buys groceries here]VFORM *fin* and [could be interested in working with us]VFORM *fin*.
	- c. Dan [protested for two years]VFORM *fin* and [will keep protesting]VFORM *fin*.

Yet another feature that resides in the CAT value of verbal expressions is the head feature INV, which indicates whether a given verbal expression is invertable or not. Hence, inverted structures cannot be coordinated with non-inverted ones:


But if the inverted clause precedes the non-inverted one, then such coordinations become somewhat more acceptable. In fact, Huddleston et al. (2002: 1332–1333) note attested cases like (35).

(35) Did you make your own contributions to a complying superannuation fund and your assessable income is less than \$31,000?

A similar problem arises for the feature AUX, which distinguishes auxiliary verbal expressions from those that are not auxiliary:

	- b. [Tom went to NY yesterday]AUX <sup>−</sup> and [he will return next Tuesday]AUX <sup>+</sup>.
	- c. Fred [sang well]AUX <sup>−</sup> and [will keep on singing]AUX <sup>+</sup>.

However, this problem vanishes in the account of the English Auxiliary System detailed in Sag et al. (2020), since in that analysis, the feature AUX does not indicate whether the verb is auxiliary or not. Rather, the value of AUX for auxiliary

<sup>8</sup>That said, some cases are more acceptable, such as (i):

<sup>(</sup>i) I expect [to be there]VFORM *inf* and [that you will be there too]VFORM *fin*.

See Section 6 for more discussion about such cases.

16 Coordination

verbs is resolved by the construction in which the verb is used. Since all the constructions in (36) are canonical VPs (i.e. non-inverted), then all the coordinands in (36) are specified as AUX– in the Sag et al. (2020) analysis.

Similarly, argument-marking PPs cannot be coordinated with modifying PPs simply because the former are specified with different PFORM and SELECT values. This explains the contrast in (37). The first PP is the complement that *rely* selects but the second is a modifier. Thus, they have different CAT values and cannot be coordinated.

	- b. \* Kim relied on Mia and on Sunday.

Consequently, it is in general not possible to coordinate argument marking PPs headed by different prepositions, simply because they bear different PFORM values, as shown in (38).

(38) a. \* Kim depends [[on Sandy]PFORM *on* or [to Fred]PFORM *to*]. b. \* Kim is afraid [[of Sandy]PFORM *of* and [to Fred]PFORM *to*].

Similarly, adjectives that are specified as PRED+ cannot be coordinated with PRED− adjectives, without stipulation:

(39) a. \* I became [former]PRD <sup>−</sup> and [happy]PRD <sup>+</sup>.

b. \* He is [happy]PRD <sup>+</sup> and [Fred]PRD <sup>−</sup>.

c. \* [Mere]PRD <sup>−</sup> and [happy]PRD <sup>+</sup>, Fred rode on into the sunset.

Since case information is also part of CAT, the theory predicts that coordinands must be consistent, which is borne out by the facts, as the unacceptability of (40) shows.<sup>9</sup> Many other examples of CAT mismatches exist, but the list above suffices to illustrate the breadth of predictions that follow from the feature geometry of CAT and the constraints imposed by the coordination construction.

(40) a. \* I saw [her and he]. b. \* He likes [she and me ].

Mispredictions also exist. We already discussed the example in (35), concerning the feature INV, but there are others. For example, requiring that the SLASH

<sup>9</sup>There are nonetheless collocational cases where the distribution of pronouns defies this pattern, due to presumably prescriptive forces (Grano 2006). See also Lohmann (2014: 105, 107) for a broader multifactorial study of binomial expressions in which syllable length and frequency have a major effect in predicting nominal coordinand order, among other things.

#### Anne Abeillé & Rui P. Chaves

value of the coordinands be the same readily predicts Coordinate Structure Constraint effects like (41), but it incorrectly rules out asymmetric coordination violation cases like (42). See Goldsmith (1985), Lakoff (1986), Levin & Prince (1986), and Kehler (2002) for more examples and discussion.

	- b. \* [To him] <sup>1</sup> PP [[Fred gave a football \_]SLASH<sup>h</sup> <sup>1</sup> <sup>i</sup> and [Kim gave me a book]SLASHh i].
	- c. \* [To him] <sup>1</sup> PP [[Fred gave a football to me]SLASHh i and [Kim gave a book \_]SLASH<sup>h</sup> <sup>1</sup> <sup>i</sup> ].
	- b. What was the maximum amount that I can [contribute \_SLASHhNP <sup>i</sup> and still get a tax deductionSLASHh i]?

Chaves (2012b) argues that there are no independent grounds to assume that asymmetric coordination is anything other than coordination, and therefore the coordination construction must not impose SLASH identity across coordinands (GAP identity in his version of the theory). Rather, the Coordinate Structure Constraint and its asymmetric exceptions are best analyzed as pragmatic in nature, as Kehler (2002: Chapter 5) argues. See Borsley & Crysmann (2021: Section 3), Chapter 13 of this volume for more discussion. In practice, this means that the coordination construction should impose identity of some of the features in CAT, though not all, despite the fact that one of the prime motivations for CAT was coordination phenomena.

Like in the case of locally specified valents, the category of the extracted phrase is also structure-shared in coordination. Hence, case mismatches like (43) are correctly ruled out.

(43) \* [Him]NP[*acc*] , [all the critics like to praise \_]SLASHhNP[ ] i but [I think \_ would probably not be present at the awards]SLASHhNP[] i.

There are, however, cases where the case of the ATB-extracted phrase can be syncretic (Anderson 1983). This is illustrated in (44) using examples by Levine et al. (2001: 205) and Goodall (1987: 75), respectively.

	- b. We went to see a movie which [the critics praised \_ but that Fred said \_ would probably be too violent for my taste].

#### 16 Coordination

The feature CASE is responsible for identifying the case of nominal expressions. Pronouns like *him* are specified as *acc*(*usative*), and pronouns like *I* are *nom*(*inative*), and expressions like *who* or *Robin* are left underspecified for case. According to Levine et al. (2001: 207), the case system of English involves the hierarchy in Figure 9.

Figure 9: Type hierarchy of (structural) case assignments

Finite verbs assign structural nominative (*snom*) to their subjects and structural accusative (*sacc*) to their objects. Most nouns and some pronouns like *who* and *what* are underspecified for case, and thus typed as *scase*, which makes them consistent with both nominative and accusative positions. Hence, *a movie* can simultaneously be required to be consistent with *snom* and *sacc* by resolving into the syncretic type *nom\_acc*, which is a subtype of both *snom* and *sacc*. Pronouns like *him* and *her* are specified as *acc* and therefore are not compatible with the *nom\_acc* type. The same goes for *nom* pronouns like *he* and *she*, etc. Hence, the problem of case syncretism is easily solved. See Section 6 for more discussion about the related phenomenon of coordination of unlike categories.

# **4.2 Coordination and agreement**

Another thorny issue for syntactic theory and coordination structures concerns agreement. According to Pollard & Sag (1994: Section 2.4.2), agreement information is introduced by the INDEX feature in semantics, not morphosyntax. Hence, different expressions with inconsistent person, gender, and number specifications are free to combine. But Wechsler & Zlatić (2003: Chapter 2) have also argued that there should be a distinct feature called CONCORD, which is morphosyntactic in nature (see Wechsler 2021: Section 4.2, Chapter 6 of this volume). The motivation for this move is that there are languages, like Serbo-Croatian, which display hybrid agreement:

(45) Ta that.SG.F dobra good.SG.F deca children su AUX.3PL doš-l-a.<sup>10</sup> come-PTCP-N.PL (Serbo-Croatian) 'Those good children came.'

<sup>10</sup>Wechsler & Zlatić (2003: 51)

#### Anne Abeillé & Rui P. Chaves

The collective noun *deca* 'children' triggers feminine singular (morphosyntactic) agreement on NP-internal items, in this case the determiner *ta* 'that' and the adjective *dobra* 'good'. There are HPSG analyses that argue that what appears to be Closest Conjunct Agreement (see Section 4.3.1 below) is in fact agreement with the whole coordinate NP, which has additional features inherited from the first and last coordinands. Villavicencio et al. (2005: Section 5) propose two additional features: LAGR (for the left-most coordinand) and RAGR (for the right-most coordinand) for determiner and (attributive) adjective agreement in Romance, which involves the CONCORD feature. Semantic agreement on the other hand, is seen in the verb *su*, which is inflected for third person plural, in agreement with the semantic properties of the subject *deca*. The two kinds of agreement are also visible in English:

	- b. The committee have/has made a decision.

The resolution of agreement information in coordination is not a trivial matter of matching the conjunct's agreement information. There are usually complex constraints involved in determining what the agreement of the mother node is, given that of the coordinands. We turn to this problem below.

# **4.3 Agreement strategies with coordinate phrases**

In case of coordinands with conflicting agreement values, various resolution strategies are observed crosslinguistically. For example, a coordination with a first person is first person, and a coordination with second person (and no first person) is second person:

	- b. Paul and you like yourselves / \* themselves.

In gender-marking languages, coordination with conflicting gender values is often resolved to masculine, at least for animates (Corbett 1991: 186). This is illustrated in (48) for Portuguese taken from or based on examples by Villavicencio, Sadler & Arnold (2005).

(48) a. o the.M.SG homem man.M.SG e and a the.F.SG mulher woman.F.SG modernos<sup>11</sup> modern.M.PL (Portuguese) 'the modern man and woman'

<sup>11</sup>Villavicencio et al. (2005: 433)

16 Coordination

b. morbidez morbidity.F.SG e and morte death.F.SG prematuras<sup>12</sup> premature.F.PL 'premature morbidity and death'

Sag (2003: 281) proposes that first person is a subtype of second person, which is itself a subtype of third person. This way, person resolution in coordination amounts to type unification. Addressing gender resolution, Aguila-Multner & Crysmann (2018) propose a list-based encoding of person and gender values, and list concatenation as a combining operation, as shown in (49). For gender, they propose a M(ASCULINE) feature that has an empty list value for feminine words, and a non-empty list value for masculine words. The coordination of a masculine noun (*chevaux* 'horses') with a feminine noun (*ânesses* 'female donkey') yields a masculine NP with a non-empty list value for M. Only the coordination of two feminine nouns yields a feminine NP with an empty list value M.

(49) *nom-coord-phrase* ⇒

For person agreement, they use two list valued features ME and YOU. A first person has a non-empty ME list, second person has an empty ME list and a non-empty YOU list, and third person has both empty lists. Thus, coordinating a first with a third person yields a ME feature with a non-empty list, and a YOU feature with a non-empty list, hence a first person phrase. Coordinating a third person with a second person yields a non-empty YOU list and an empty ME list, hence a second person phrase. This enables person and gender resolution by list concatenation over coordinands.

<sup>12</sup>See Villavicencio et al. (2005: 434) for similar examples.

Anne Abeillé & Rui P. Chaves

### **4.3.1 Closest Conjunct Agreement**

As observed by Corbett (1991: 186), many languages, including Romance, Celtic, Semitic, and Bantu languages, also have another strategy: partial agreement with only one coordinand, the one closest to the target, called *Closest Conjunct Agreement* (CCA). In the following examples, again from Portuguese and taken from Villavicencio et al. (2005), the determiner and prenominal adjective agree with the first noun (50a) and the postnominal adjective with the last noun (50b).

	- b. Esta this.F.SG cancão song.F.SG anima animates os the.M.PL corações hearts.M.PL e and mentes minds.F.PL brasileiras.<sup>14</sup> Brazilian.F.PL 'This song animates Brazilian hearts and minds.'

For French determiners and attributive adjectives, An & Abeillé (2017) and Abeillé et al. (2018) show on the basis of corpus data and experiments that number agreement may also obey CCA. As far as gender is concerned, prenominal adjectives always obey CCA, while postnominal ones do so half of the time (in contemporary French). In (51a), the determiner can be singular (CCA) or plural (resolution), while in (51b), CCA (feminine Det) is obligatory. In (51c), the postnominal adjective can be masculine (resolution) or feminine (CCA), with the same meaning.

	- certain.F.PL certain.M.PL collectivity.F.PL and organism.M.PL publics<sup>16</sup> public.M.PL 'certain public collectivities and organisms'

<sup>13</sup>Villavicencio et al. (2005: 435)

<sup>14</sup>Villavicencio et al. (2005: 437)

<sup>15</sup>An & Abeillé (2017: 34)

<sup>16</sup>Abeillé et al. (2018: 17)

16 Coordination

c. des some départements department.M.PL et and régions region.F.PL importants important.M.PL / importantes important.F.PL 'some important departments and regions'

As proposed by Wechsler & Zlatić (2003: Chapter 2), HPSG distinguishes two agreement features: CONCORD is used for morphosyntactic agreement and INDEX is used for semantic agreement (see Wechsler 2021: Section 4.2, Chapter 6 of this volume). Moosally (1999) proposes an account of single coordinand predicateargument agreement in Ndebele, which she analyses as INDEX agreement. She has a version of the following constraint that shares the INDEX value of the (nominal) coordinate mother with that of the last coordinand (p. 389):

(52) *nom-coord-phrase* ⇒ SYNSEM|LOC|CONT|INDEX 1 DTRS -, …, - SYNSEM|LOC|CONT|INDEX 1 

But in other languages, such as Welsh, there is evidence that the INDEX of the coordinate structure is resolved, even though predicate-argument agreement is controlled by the closest coordinand:

(53) Dw be.1SG i I a and Gwenllian Gwenllian.3SG heb without gael get ein CL.1PL talu.<sup>17</sup> pay (Welsh) 'Gwenllian and I have not been paid.'

This is why Borsley (2009) proposes that CCA is superficial in Welsh and uses linearization domains<sup>18</sup> to handle partial agreement between the initial verb and the first coordinand, which are not sisters. The hypothesis was that verb-subject agreement involves order domains and coordinate structures are not represented in order domains. This allows what looks like agreement with a closest coordinand to be just that. See also Wechsler (2021: Section 7.2), Chapter 6 of this volume. The alternative developed by Villavicencio et al. (2005) assumes that coordinate structures have features reflecting the agreement properties of their first and last coordinands, to which agreement constraints may refer. As mentioned above, Villavicencio et al. (2005) use three features: CONCORD, LAGR (for the left-most coordinand), and RAGR (for the right-most coordinand).

<sup>17</sup>Sadler (2003: 90)

<sup>18</sup>Order domains were introduced into HPSG by Reape (1994); for more on order domains see Müller (2021: Section 6), Chapter 10 of this volume.

Anne Abeillé & Rui P. Chaves

CONCORD 1

(54) *nom-coord-phrase* ⇒ SYNSEM|LOC|CAT|HEAD LAGR 1 RAGR 2 DTRS -SYNSEM|L|CAT|HEAD|LAGR 1 , …, - SYNSEM|L|CAT|HEAD|RAGR 2 (55) *noun* ⇒ LAGR 1 RAGR 1 

 Nouns have the same value for CONCORD, LAGR, and RAGR, and determiner and (attributive) adjective agreement in Romance involves the CONCORD feature. Attributive adjectives constrain the agreement features of the noun they modify (via the MOD or SEL feature). One may distinguish two types for prenominal and postnominal adjectives, by the binary LEX ± feature (Sadler & Arnold 1994) or by the WEIGHT light/non-light feature (Abeillé & Godard 1999). In this perspective, each has its agreement pattern, which we simplify as follows, using '∨' to express a disjunction of feature values:


In the absence of coordination, these constraints apply vacuously, since CON-CORD, LAGR, and RAGR all share the same values.

# **5 Lexical coordination**

While coordinands have often been assumed to be phrasal (see for example Kayne 1994: Section 6.2 and Bruening 2018: Section 5.2, among others), Abeillé (2006) gives several arguments in favor of lexical coordination. In some contexts, words (or phrases with a premodifier) are allowed, but not full phrases. In English, this is the case with prenominal adjectives and postverbal particles. See Abeillé (2006: Section 4) for similar examples with various categories in different languages. Most English attributive adjectives are prenominal unless they have a complement. Although adjectival phrases with complements are not licit in prenominal

16 Coordination

position, it is possible to have complex adjectival expressions if they are coordinate.

	- b. \* a [taller than you] man
	- c. \* a [proud of his work] man
	- d. a [big and tall] man

As observed by Pollard & Sag (1987: 176–177), a particle may project a phrase after the nominal complement (59a), but not before (59b); but coordination is possible, at least for some speakers, as the example in (59c) from Abeillé (2006: 23) shows.

	- b. Paul turned [(\*completely) off] the radio.
	- c. Paul was turning [on and off] the radio all the time.

While phrasal coordination can conjoin unlike categories (see below), this is not the case with lexical coordination:

	- b. # Paul is [head and proud] of the school.

Semantically, lexical coordination is more constrained than phrasal coordination. With *and*, two lexical verbs that share a preverbal clitic in French must share the same verbal root, and in Spanish, they must refer to the same event (Bosque 1987).


'I buy it today and sell it tomorrow.'

#### Anne Abeillé & Rui P. Chaves

Some apparent cases of lexical coordination may be analyzed as Right-Node Raising (Beavers & Sag 2004). These cases differ semantically and prosodically from Right-Node Raising, however: with typical Right-Node Raising, the two coordinands must stand in contrast to one another, and do not have to refer to the same event. With Right-Node Raising, there is usually a prosodic boundary at the ellipsis site (Chaves 2014: 843–844 and Nykiel & Kim 2021: Section 6.2, Chapter 19 of this volume). In French, the first coordinand cannot end with a clitic article or with a weak preposition as in (62b,c), quoted from (Abeillé 2006: 14).


No such boundary occurs before the coordinator in lexical coordination. Thus, in French, clitic articles or weak prepositions with a shared argument can be conjoined (Abeillé 2006: 14):


b. un a film film de by et and avec with Woody Woody Allen Allen

The functor analysis of coordinands in (13) is compatible with lexical coordination, since the head-functor phrase in (14) has the same valence features as the head. The weak head analysis in (16) is also compatible, since the coordinator inherits the complements expected by the coordinand (this is done by concatenation of COMPS lists as it is for complex predicates; see Godard & Samvelian 2021: Section 3, Chapter 11 of this volume).

The construct resulting from the coordination of lexical elements has hybrid properties: as a syntactic construct, it must be a phrase, but it also behaves as a word. Coordinate verbs behave as lexical heads; coordinate adjectives may occur in positions ruled out for phrases. To overcome this apparent paradox, Abeillé (2006: Section 5.1) analyses it as an instance of a "light" phrase, following the

### 16 Coordination

WEIGHT account of Abeillé & Godard (2000; 2004). Light elements can be words or phrases, and can have restricted mobility (see Müller 2021, Chapter 10 of this volume). For example, prenominal modifiers can be constrained to be [WEIGHT *light*]. In this theory, light phrases can be coordinate phrases or head-adjunct phrases, provided all their daughters are light. Figure 10 illustrates this, assuming a functor analysis.

Figure 10: Examples of lexical coordination

# **6 Coordination of unlike categories**

The categories of coordinands are required to be the same per the the coordination construction in (11). But this requirement is excessive, as illustrated by the coordinations in (64) from Bayer (1996: 580) and Huddleston et al. (2002: 1327), among others; see Chaves (2013: 169–170). Such data raise the problem of determining what the part of speech and the categorial status of the coordinate phrase should be.

	- b. Pat is [a Republican]NP and [proud of it]AP.
	- c. Jack is [a good cook]NP and [always improving]VP.

#### Anne Abeillé & Rui P. Chaves


Building on observations from Jacobson (1987: 417), Sag (2003) and others pointed out that the features of the mother are not simply the intersection of the features of the coordinands. For example, verbs like *remain* are compatible with both AP and NP complements, whereas *grew* is only compatible with APs. This is shown in (65). Crucially, however, the information associated with the phrase *wealthy and a Republican* somehow allows *grew* to detect the presence of the nominal, as (66a) illustrates, even when the verbs are coordinated, as in (66b–d).

	- b. Kim remained/\*grew a Republican.
	- b. Kim grew and remained wealthy.
	- c. \* Kim grew and remained a Republican.
	- d. \* Kim grew and remained [wealthy and a Republican].

A number of influential accounts in Type-Logical Grammar (Morrill 1990; Morrill 1994; Bayer 1996) use disjunction introduction, one of the rules of inference from propositional calculus, in order to deal with coordination of unlikes phenomena. Disjunction introduction allows one to infer ∨ from , and if one assumes that categories like NP, PP, and so on can also be disjunctive, the grammar allows an expression of type 'NP' to lead a double life as an 'NP ∨ PP' expression, or the type 'AP' to be taken as an 'AP ∨ PP ∨ NP', and so on. This kind of approach has been adapted to HPSG; see, for example, Daniels (2002) and Yatabe (2004). Related work, such as Sag (2003), aims to achieve the same result using type-underspecification. Other, more exploratory work views coordination of unlike categories as the result of parts of speech being gradient and epiphenomenal rather than hard-coded into the type signature (Chaves 2013). Finally, Crysmann (2001), Yatabe (2003), Beavers & Sag (2004), and Chaves (2006) argue that coordination of unlikes can be explained by a deletion operation that omits the left periphery of non-initial coordinands, illustrated in (67).

#### 16 Coordination

	- b. He drinks coffee with milk at breakfast and drinks coffee with cream in the evening.<sup>20</sup>
	- c. There was one fatality yesterday, and there were two others on the day before.<sup>21</sup>
	- d. I see the music as both going backward and going forward.<sup>22</sup>

In such a view, the examples in (64) are verbal coordinations where the verb (or the verb and subject) has been deleted (e.g. *Kim is alone and is without money*). The problem is that left-periphery ellipsis cannot fully explain coordination of unlikes phenomena. For example, there is no elliptical analysis of data like (68). Levine (2011) offers arguments against the coercion account of Chaves (2006) and against the existence of left-periphery ellipsis. See Yatabe 2012 for a reply.

	- b. Both tired and in a foul mood, Bob packed his gear and headed North.<sup>24</sup>
	- c. Both poor and a Republican, *no one* can possibly be.<sup>25</sup>
	- d. Dead drunk and yet in complete control of the situation, *no one* can be.<sup>26</sup>

Further problems for an ellipsis account of coordination of unlikes phenomena are posed by the position of the correlative coordinators *both*, *either*, and *neither* in (69).

	- b. It's both odd and in very poor taste to have a fake wedding.
	- c. Who's neither tired nor in a hurry?
	- d. Isn't she either drunk or on medication?

If (69a) is an elliptical coordination like *isn't this both illegal and isn't this a safety hazard*, then the location of *both* is unexpected. Instead of occurring before the

<sup>19</sup>Chaves (2013: 171)

<sup>20</sup>Hudson (1984: 214)

<sup>21</sup>Chaves (2007: 339)

<sup>22</sup>https://www.hdtracks.com/music/artist/view/?id=2418; accessed 2020-04-01.

<sup>23</sup>Chaves (2013: 172)

<sup>24</sup>Chaves (2006: 112)

<sup>25</sup>Chaves (2013: 172)

<sup>26</sup>Levine (2011: 142)

Anne Abeillé & Rui P. Chaves

first coordinand, it is realized inside the first coordinand. Crucially, the nonelided counterparts are not grammatical, e.g. \**isn't this both illegal and isn't this a safety hazard?* The same issue is raised by (69b,c). In an elliptical account, one would have to stipulate that *both* can only float in the presence of ellipsis, which is unmotivated. Finally, see Mouret (2007) for an extensive discussion in favor of a non-elliptical analysis of unlike coordination, based on correlative coordination. In sum, left-periphery ellipsis does not offer a complete account of coordination of unlikes, and underspecification accounts are more promising.

# **7 Non-constituent coordination**

The fact that not all coordination of unlike categories can be reduced to deletion does not entail that deletion is impossible, or that no phenomena involve deletion. We refer the reader to Nykiel & Kim (2021), Chapter 19 of this volume for more discussion about ellipsis.

Consider, for example, the non-constituent coordinations in (70).

	- b. Tom loves and Mary absolutely hates spinach dip. (Right-Node Raising)
	- c. Tom knows how to cook pizza, and Fred spaghetti. (Gapping)

Some authors regard Argument Cluster Coordination as elliptical (Yatabe 2001; Crysmann 2004; Beavers & Sag 2004); others regard such phenomena as nonelliptical sequences (Mouret 2006). In the former approach, phonological material in the left periphery of the non-initial coordinand that is identical to phonological material in the left periphery of the initial coordinand is allowed to be absent in the mother node. This can be achieved by adding the constraints in (71) to the coordination construction, here shown in the binary-branching format for perspicuity. Here, *coord* is an abbreviation of the phonologies of coordinators, like *and*, *or*, etc.

(71) *coord-phrase* ⇒ PHON 1 ⊕ 2 ⊕ 3 ⊕ 4 DTRS \* PHON 1 ⊕ 2 *ne-list* SYNSEM|LOC|CAT|COORD *none* , PHON 3 *coord* ⊕ 1 ⊕ 4 *ne-list* SYNSEM|LOC|CAT|COORD *crd* + 

### 16 Coordination

If 1 is resolved as the empty list then no ellipsis occurs, but if 1 is non-empty then ellipsis occurs, as illustrated in Figure 11. Some accounts, like Yatabe (2001), Crysmann (2004), Beavers & Sag (2004), and Chaves (2008) operate on linearization domain elements instead of directly on PHON. See Müller (2021: Section 6), Chapter 10 of this volume for more discussion about linearization theory.

> VP - PHON 1 ⊕ 2 ⊕ 3 ⊕ 4

VP - PHON 1 *give* ⊕ 2 *a, book, to, Mary* VP - PHON 3 *and* ⊕ 1 *give* ⊕ 4 *a, magazine, to, Sue* Coord VP

This approach is motivated by the existence of ambiguity in sentences like (72); see Beavers & Sag (2004) and Chaves (2006) for more examples and discussion. Because (72a) involves a one-time predicate, the ellipsis must include the subject phrase, otherwise the interpretation is such that the same two trees were cut down twice. In contrast, (72b) does not involve a one-time predicate, and thus it is possible for the ellipsis to simply involve the verb.

	- b. Two trees were photographed by Robin in July and by Alex in September. (Two trees were photographed by Robin in July and photographed by Alex in September)

In the non-elliptical analysis of such data, the missing material is recovered from the preceding coordinand. For example, Mouret (2006: 263) proposes a rule along the lines of (73). Here, a new head feature CLUSTER is introduced, which takes as its value the list of SYNSEM values of the daughters.

(73) *argument-cluster-phrase* " ⇒ HEAD|CLUSTER 1, …, n DTRS -SYNSEM 1 , …, - SYNSEM n #

Mouret defines argument clusters as instances of the underspecified non-headed construction *argument-cluster-phrase* with one daughter or more. The construction is valence saturated and clusters can be coordinated with one another. He

#### Anne Abeillé & Rui P. Chaves

also postulates a lexical rule allowing (for example) a ditransitive verb to take a coordination of clusters as complement (this rule will also allow clusters for complements and adjuncts, assuming the latter are included in the COMPS list):

$$\begin{aligned} \text{(74)} \quad & \left[ \text{coMSS} \left< \left[ \text{Loc} \left| \text{car} \left[ \exists \right], ..., \left[ \text{Loc} \left| \text{car} \left[ \exists \right] \right> \right] \right> \right] \mapsto \\ & \left[ \text{coMSS} \left< \left[ \begin{bmatrix} \text{coORD} + \\ \text{READ} \left| \text{CLASSTER} \left\langle \left[ \text{Loc} \left| \text{CAT} \left[ \exists \right] \right], ..., [\text{Loc} \left| \text{CAT} \left[ \text{BB} \right] \right> \right] \right> \right] \right] \right. \end{aligned} \right] $$

Figure 12 shows the analysis of the VP in (70a). The respective NPs and PPs form a cluster that is licensed by (73). The phrases *a book to Mary* and *a magazine to Sue* are coordinated and the respective CLUSTER values matched (see Mouret 2006: 263 for details on this matching). The lexical item for *give* is licensed by the lexical rule in (74). This version of *give* selects the cluster coordination rather than selecting the NP and PP directly.

Figure 12: Mouret's (2006) analysis of Argument Cluster Coordination

This approach is motivated by non-clausal coordinators (*as well as* and its French equivalent *ainsi que*), which are possible in Argument Cluster Coordination, but cannot conjoin tensed VPs:

	- b. \* John gave a book to Mary as well as gave a magazine to Sue.

16 Coordination

	- livre book à to Jean. Jean 'Paul will offer a record to Marie as well as will offer a book to Jean.'

Another argument is the placement of correlative coordinators: the first coordinator in (76a) must be postverbal; this shows that Argument Cluster Coordination does not include the first verb. The examples below are from Mouret (2006: 254).

(76) a. Jean Jean a has donné given et and un a livre book à to Marie Marie et and un a magazine magazine à to Sue. Sue (French)

'Jean has given both a book to Marie and a magazine to Sue.'


Another argument is negation placement, which is a case of constituent negation (Mouret 2006: 253):

(77) a. Paul Paul offrira offer.FUT.3SG un a disque record à to Marie Marie et and (non) not pas not un a livre book à to Jean. Jean (French) 'Paul will offer a record to Marie and not a book to Jean.'

<sup>27</sup>Mouret (2006: 253)

#### Anne Abeillé & Rui P. Chaves


A syntactic and non-elliptical account of Right-Node Raising is harder to maintain given that this phenomenon does not seem to be sensitive to syntactic structure, as (78) shows. See Bresnan (1974), Wexler & Culicover (1980: 299), Grosu (1981: 45), McCawley (1982: 98–101), and Sabbagh (2007: 382, fn. 30) for more data and discussion.<sup>28</sup> In the examples that follow, small capital letters indicate prosodic focus and material shared between both coordinands is delineated by square brackets.

	- b. John wonders when Bob Dylan WROTE and Mary wants to know when he RECORDED [his great song about the death of Emmet Till].
	- c. Politicians WIN WHEN THEY DEFEND and LOSE WHEN THEY ATTACK [the right of a woman to an abortion].
	- d. Lucy CLAIMED that but COULDN'T SAY exactly when [the strike would take place].
	- e. I found a box IN which and Andrea found a blanket UNDER which [a cat could sleep peacefully for hours without being noticed].

Another source of evidence against syntactic and non-elliptical accounts of Right-Node Raising is that this phenomenon can involve lexical structure, as the examples in (79) by Huddleston et al. (2002: 1325, fn. 44) and Chaves (2008; 2014) illustrate:

	- b. It is neither UN- nor OVERLY [patriotic] to tread that path.<sup>30</sup>
	- c. The EX- or CURRENT [smokers] had a higher blood pressure.<sup>31</sup>
	- d. The NEURO- and COGNITIVE [sciences] are presently in a state of rapid development […]<sup>32</sup>
	- e. Are you talking about A NEW or about AN EX-[boyfriend]?<sup>33</sup>

<sup>28</sup>Steedman (1985: 542; 1990: 256; 2000: 17) and Dowty (1988: 183–184) claim that Right-Node Raising is syntactically bounded. See Phillips (1996: 95) and Chaves (2014: 841) for rebuttals. <sup>29</sup>Huddleston et al. (2002: 1325, fn. 44)

<sup>30</sup>Chaves (2008: 267)

<sup>31</sup>Chaves (2008: 267)

<sup>32</sup>https://opinionator.blogs.nytimes.com/2011/12/25/the-future-of-moral-machines/; 2021-01-19. <sup>33</sup>Chaves (2014: 867)

### 16 Coordination

Elliptical accounts of Right-Node Raising are proposed by Beavers & Sag (2004), Yatabe (2004), Chaves (2014), and others. The rule in (80) illustrates the account adopted by Chaves (2014: 874) and Shiraïshi, Abeillé, Hemforth & Miller (2019: 19) in simplified format.<sup>34</sup> In a nutshell, the M(ORPHO-)P(HONOLOGY) feature introduces two list-valued features, namely PHON(OLOGY) and L(EXICAL-)ID(ENTIFIER). The former encodes phonological content, including phonological phrasing information, whereas the latter is used to individuate lexical items semantically (i.e. the value of LID is a list of semantic frames that canonically specify the meaning of a lexeme).

$$\begin{array}{c} \text{(80)} \quad \begin{array}{l} \text{right-} \text{right-} \text{peripheral-ellips-} \text{-} \text{phase} \implies \\ \begin{bmatrix} \text{MP} & \begin{bmatrix} \underline{\text{L}}\_{1} \right] \oplus \begin{bmatrix} \underline{\text{R}}\_{1} \end{bmatrix} \oplus \begin{bmatrix} \underline{\text{R}}\_{2} \right] \oplus \begin{bmatrix} \underline{\text{R}}\_{3} \end{bmatrix} \\\\ \text{SYNSEM} \begin{bmatrix} \end{bmatrix} \\\\ \begin{bmatrix} \text{MP} & \begin{bmatrix} \underline{\text{L}}\_{1} \end{bmatrix} \oplus \begin{bmatrix} \underline{\text{L}}\_{2} \end{bmatrix} \end{array} \left\langle \begin{bmatrix} \text{PHON} \begin{bmatrix} \overline{p\_{1}} \\ \underline{\text{L}} \end{bmatrix} \end{bmatrix}, \dots, \begin{bmatrix} \text{PHON} \begin{bmatrix} \overline{p\_{1}} \\ \overline{\text{L}} \end{bmatrix} \right\rangle \oplus \\\\ \begin{bmatrix} \underline{\text{R}}\_{1} \end{bmatrix} \oplus \begin{bmatrix} \underline{\text{R}}\_{2} \end{bmatrix} \left\langle \begin{bmatrix} \underline{\text{PHON}} \ \overline{\langle \overline{p\_{1}} \rangle} \end{bmatrix}, \dots, \begin{bmatrix} \text{PHON} \begin{bmatrix} \overline{p\_{1}} \\ \overline{\text{L}} \end{bmatrix} \right\rangle \right\rangle \end{array} \right\rangle \\\\ \text{SYNSEM} \begin{bmatrix} \underline{\text{L}} \end{bmatrix} \end{array} \end{array}$$

By requiring PHON identity, this rule ensures that Right-Node Raising only targets strings that are phonologically independent and have the same surface form, ruling out the ungrammatical examples in (81). The assumption here is that the value of PHON is not simply a list of phonemes, but rather a structured list containing intonational phrases, phonological phrases, prosodic words, syllables, and segments.

Stressed pronouns, affixes that correspond to independent prosodic words, and compound parts can be Right-Node Raised because they are independent prosodic units in their local domains. See Swingle (1995) for more discussion.

	- b. \* I think that I'D and I know that PAT'LL [buy those portraits of Elvis].
	- c. \* They've always WANTED a and so I've GIVEN THEM a [coffee grinder].
	- d. \* I bought EVERY RED and Jo liked SOME BLUE [t-shirt].

By requiring LID identity, the rule prevents homophonous strings that have fundamentally different semantics from being Right-Node Raised, as in (82). In such cases, oddness arises, because in general the same phrase cannot simultaneously have two meanings, except in puns (Zaenen & Karttunen 1984: 316).

<sup>34</sup>See Chaves (2014) for more details about how "cumulative" Right-Node Raising is modeled by this rule, i.e. cases like *Mia donated – and Fred spent –* (*a total of* ) *\$10,000* (between them).

#### Anne Abeillé & Rui P. Chaves

	- b. \* Robin SWUNG and Leslie TAMED [an unusual bat].<sup>36</sup>
	- c. \* We need new BLACK- and FLOOR[boards].<sup>37</sup>
	- d. \* I caught BUTTER- and FIRE[flies].<sup>38</sup>
	- e. \* There stood a ONE- and WELL-[armed man].<sup>39</sup>

At the same time, LID identity does not go as far as requiring co-referentiality of the shared material. This is as intended, given ambiguous examples like *Chris LIKES and Bill LOVES* [*his bike*]. The account of Right-Node Raising is illustrated below. Here, *I* corresponds to an intonational phrase, and to a phonological phrase. Note that this is a unary-branching rule, which means that it can in principle apply to any phrasal node, including non-coordinate cases of Right-Node Raising:

S *phrase* MP \* "PHON - - */kɪm lɑɪks/* LID *…* # , " PHON - - */ænd mijə heɪts/* LID *…* # , " PHON - - */beɪɡəlz/* LID *…* #+ S *phrase* MP \* " PHON - - */kɪm lɑɪks/* LID *…* # , " PHON - - */beɪɡəlz/* LID *…* # , " PHON - - */ænd mijə heɪts/* LID *…* # , " PHON - - */beɪɡəlz/* LID *…* # + 

Kim LIKES bagels and Mia HATES bagels.

Figure 13: Analysis of *Kim likes, and Mia hates, bagels.*

	- b. Anyone who MEETS really comes to LIKE [our sales people].<sup>41</sup>

<sup>35</sup>Milward (1994: 936)

<sup>36</sup>Levine & Hukari (2006: 156)

<sup>37</sup>adapted from Artstein (2005: 371)

<sup>38</sup>Chaves (2008: 274)

<sup>39</sup>Chaves (2014: 869)

<sup>40</sup>Hudson (1976: 550)

<sup>41</sup>adapted from Williams (1990: 267)

### 16 Coordination


In the example in Figure 13, the sub-list *<sup>3</sup>* in (80) is resolved as the empty list, but this need not be so. When the final sublist is not resolved as the empty list, we obtain discontinuous Right-Node Raising cases like (84), due to Whitman (2009: 238–240) and Chaves (2014: 868), where the Right-Node Raised expression is followed by extra material.

	- b. During the War of 1982, American troops OCCUPIED and BURNED [the town] to the ground.
	- c. Please move from the exit rows if you are UNWILLING or UNABLE [to perform the necessary actions] without injury.
	- d. The troops that OCCUPIED ended up BURNING [the town] to the ground.

Finally, let us now turn our attention to Gapping, as in *Robin likes Sam and Tim* \_ *Sue*. There are elliptical accounts of Gapping (Chaves 2006) as well as directinterpretation accounts where the missing material is recovered from the preceding linguistic context (Mouret 2006; Abeillé et al. 2014; Park 2019); see Nykiel & Kim (2021: Section 6.1), Chapter 19 of this volume. The latter is illustrated in Figure 14, in simplified format. Basically, the Question Under Discussion (QUD, Roberts 1996) of the first clause is *..*∃ ( (*,* )) which is information that is shared across the clausal daughters as 1 . This allows the second coordinand to combine the two NPs with the verbal semantics, and recover the propositional meaning.

<sup>42</sup>Postal (1994: 101)

<sup>43</sup>Postal (1994: 104)

<sup>44</sup>Huddleston et al. (2002: 1344)

<sup>45</sup>Chaves (2014: 840)

Figure 14: Analysis of *Robin likes Sam and Tim – Sue* (abbreviated)

Like Right-Node Raising, Gapping is not restricted to coordinate structures as Park's (2019: 30–31) attested examples in (85) illustrate, contrary to widespread assumption. Thus, the Gapping rule proposed by Park (2019: 125) that allows a gapped clause to follow a non-gapped clause is not specific to coordination.

(85) a. Robin speaks French better than Leslie \_ German.


# **8 Conclusion**

Coordination is a pervasive phenomenon in all natural languages. Despite intensive research in the last 70 years, its empirical properties continue to challenge most linguistic theories: the coordination lexemes play a crucial role but do not behave like usual syntactic heads, the coordinands do not need to be identical but display some parallelism relations and can be unlimited in number, some non-constituent sequences can be coordinated, peculiar ellipsis phenomena can optionally occur, etc. We have shown how HPSG offers precise detailed analyses of various coordinate constructions for a wide variety of languages, factoring out

16 Coordination

the common properties shared by other constructions and the properties specific to coordination.

Central to the HPSG analyses are two main ideas: (i) coordination structures are non-headed phrases and come with different subtypes, and (ii) the parallelism between coordinate daughters is captured by feature sharing. From these ideas, specific properties can be derived, regarding extraction and agreement, for instance. Nevertheless, there is no clear consensus about some remaining issues. In some accounts, the coordinator is a weak head, whereas in others it is a marker. Coordinate structures are binary branching in some accounts but not so in others. Agreement is always local (with the whole coordinate phrase) in some approaches, whereas locality is abandoned by others to account for Closest Conjunct Agreement. Finally, in some accounts, non-constituent coordination involves some form of deletion, but in others no deletion operation is assumed.

# **Acknowledgments**

We are thankful to Bob Borsley, Jean-Pierre Koenig, Stefan Müller, and other reviewers for comments and suggestions on earlier drafts. As usual, all errors and omissions are our own.

# **References**


#### Anne Abeillé & Rui P. Chaves


16 Coordination

//cslipublications.stanford.edu/HPSG/2014/alqurashi-borsley.pdf (10 February, 2021).


Anne Abeillé & Rui P. Chaves


16 Coordination


#### Anne Abeillé & Rui P. Chaves

Stanford, CA: CSLI Publications. http://csli-publications.stanford.edu/HPSG/ 2004/crysmann.pdf (10 February, 2021).


### 16 Coordination


#### Anne Abeillé & Rui P. Chaves


16 Coordination

ciation for Computational Linguistics. https://www.aclweb.org/anthology/ events/coling-1994/ (10 February, 2021).


Anne Abeillé & Rui P. Chaves


### 16 Coordination


#### Anne Abeillé & Rui P. Chaves

*ifornia, Berkeley, 22–23 July, 2000*, 325–344. Stanford, CA: CSLI Publications. http://csli-publications.stanford.edu/HPSG/2000/ (29 January, 2020).


# **Chapter 17**

# **Idioms**

# Manfred Sailer

Goethe-Unversität Frankfurt

This chapter first sketches basic empirical properties of idioms. The state of the art before the emergence of HPSG is presented, followed by a discussion of four types of HPSG approaches to idioms. A section on future research closes the discussion.

# **1 Introduction**

In this chapter, I will use the term *idiom* interchangeably with broader terms such as *phraseme*, *phraseologism*, *phraseological unit*, or *multiword expression*. This means, that I will subsume under this notion expressions such as prototypical idioms (*kick the bucket* 'die'), support verb constructions (*take advantage*), formulaic expressions (*Good morning!*), and many more.<sup>1</sup> The main focus of the discussion will, however, be on prototypical idioms.

I will sketch some empirical aspects of idioms in Section 2. In Section 3, I will present the theoretical context within which idiom analyses arose in HPSG. An overview of the development within HPSG will be given in Section 4. Desiderata for future research are mentioned in Section 5, before I close with a short conclusion.

# **2 Empirical domain**

In the context of the present handbook, the most useful characterization of idioms might be the definition of *multiword expression* from Baldwin & Kim (2010:

<sup>1</sup> I will provide a paraphrase for all idioms at their first mention. They are also listed in the appendix, together with their paraphrase and a remark on which aspects of the idiom are discussed in the text.

Manfred Sailer. 2021. Idioms. In Stefan Müller, Anne Abeillé, Robert D. Borsley & Jean- Pierre Koenig (eds.), *Head-Driven Phrase Structure Grammar: The handbook*, 777–809. Berlin: Language Science Press. DOI: 10.5281/zenodo. 5599850

### Manfred Sailer

269). For them, any combination of words counts as a multiword expression if it is syntactically complex and shows some degree of *idiomaticity* (i.e., irregularity), be it lexical, syntactic, semantic, pragmatic, or statistical.<sup>2</sup> I speak of a "combination of words" in the sense of a *substantive* or *lexically filled idiom*, which contrasts with *formal* or *lexically open idioms* (Fillmore et al. 1988: 505).

Baldwin & Kim's criteria can help us structure the data presentation in this section, expanding their criteria where it seems suitable. My expansions concern the aspect known as *fixedness* in the phraseological tradition as in Fleischer (1997). 3

For Baldwin & Kim (2010), *lexical idiosyncrasy* concerns expressions with words that only occur in an idiom, so-called *phraseologically bound words*, or*cranberry words* (Aronoff 1976: 15). Examples include *make headway* 'make progress', *take umbrage* 'take offense', *in a trice* 'in a moment/very quickly'.<sup>4</sup> For such expressions, the grammar has to make sure that the bound word does not occur outside the idiom, i.e., we need to prevent combinations such as (1b).<sup>5</sup>

	- b. \* It just took them a trice to fix the problem.

We can expand this type of idiosyncrasy to include a second important property of idioms. Most idioms have a fixed inventory of words. In their summary of this aspect of idioms, Gibbs, Jr. & Colston (2007: 827–828) include the following examples: *kick the bucket* means 'die', but *kick the pail*, *punt the bucket*, or *punt the pail* do not have this meaning. However, some degree of lexical variation seems to be allowed, as the idiom *break the ice* 'relieve tension in a strained situ-

<sup>2</sup> In the phraseological tradition, the aspect of *lexicalization* is added (Fleischer 1997; Burger 1998). This means that an expression is stored in the lexicon. This criterion might have the same coverage as *conventionality* used in Nunberg et al. (1994: 492). These criteria are addressing the mental representation of idioms as a unit and are, thus, rather psycholinguistic in nature.

<sup>3</sup>Baldwin & Kim (2010) describe idioms in terms of syntactic fixedness, but they seem to consider fixedness a derived notion.

<sup>4</sup>See https://www.english-linguistics.de/codii/, accessed 2019-09-03, for a list of bound words in English and German (Trawiński et al. 2008).

<sup>5</sup>Tom Wasow (p.c.) points out that there are attested uses of many alleged bound words outside their canonical idiom, as in (i). Such uses are, however, rare and restricted.

<sup>(</sup>i) Not a trice later, the sounds of gunplay were to be heard echoing from Bad Man's Rock. (COCA)

### 17 Idioms

ation' can be varied into *shatter the ice*. <sup>6</sup> So, a challenge for idiom theories is to guarantee that the right lexical elements are used in the right constellation.

*Syntactic idiomaticity* is used in Baldwin & Kim (2010) to describe expressions that are not formed according to the productive rules of English syntax, following Fillmore et al. (1988), such as *by and large*'on the whole'/'everything considered' and *trip the light fantastic* 'dance'.

In my expanded use of this notion, syntactic idiomaticity also subsumes irregularities/restrictions in the syntactic flexibility of an idiom, i.e., whether an idiom can occur in the same syntactic constructions as an analogous non-idiomatic combination. In Transformational Grammar, such as Weinreich (1969) and Fraser (1970), lists of different syntactic transformations were compiled and it was observed that some idioms allow for certain transformations but not for others. This method has been pursued systematically in the framework of *Lexicon-Grammar* (Gross 1982).<sup>7</sup> Sag et al. (2002) distinguish three levels of fixedness: *fixed*, *semifixed*, and *flexible*. Completely fixed idioms include *of course*, *ad hoc* and are often called *words with spaces*. Semi-fixed idioms allow for morphosyntactic variation such as inflection. These include some prototypical idioms (*trip the light fantastic*, *kick the bucket*) and complex proper names. In English, semi-fixed idioms show inflection, but cannot easily be passivized, nor do they allow for parts of the idiom being topicalized or pronominalized, see (2).

	- b. \* The bucket was kicked by Alex.
	- c. \* The bucket, Alex kicked.
	- d. \* Alex kicked the bucket and Kim kicked it, too.

Flexible idioms pattern with free combinations. For them, we do not only find inflection, but also passivization, topicalization, pronominalization of parts, etc. Free combinations include some prototypical idioms (*spill the beans* 'reveal a secret', *pull strings* 'exert influence'/'use one's connections'), but also collocations (*brush one's teeth*) and light verbs (*make a mistake*).

<sup>6</sup>While Gibbs, Jr. & Colston (2007), following Gibbs, Jr. et al. (1989), present this example as a lexical variation, Glucksberg (2001: 85), from which it is taken, characterizes it as having a somewhat different aspect of an "abrupt change in the social climate". Clear cases of synonymy under lexical substitution are found with German *wie warme Semmeln/Brötchen/Schrippen weggehen* (lit.: like warm rolls vanish) 'sell like hotcakes' in which some regional terms for rolls can be used in the idiom.

<sup>7</sup>See Laporte (2018) for a recent discussion of applying this method for a classification of idioms.

### Manfred Sailer

The assumption of two flexibility classes is not uncontroversial: Horn (2003) distinguishes two types among what Sag et al. (2002) consider flexible idioms. Fraser (1970) assumes six flexibility classes, looking at a wide range of syntactic operations. Ruwet (1991) takes issue with the cross-linguistic applicability of the classification of syntactic operations. Similarly, Schenk (1995) claims that for languages such as Dutch and German, automatic/meaningless syntactic processes other than just inflection are possible for semi-fixed idioms, such as verb-second movement and some types of fronting.

The analytic challenge of syntactic idiomaticity is to capture the difference in flexibility in a non-ad hoc way. It is this aspect of idioms that has received particular attention in Mainstream Generative Grammar (MGG),8*,*<sup>9</sup> but also in the HPSG approaches sketched in Section 4.

*Semantic idiomaticity* may sound pleonastic, as, traditionally, an expression is called idiomatic if it has a conventional meaning that is different from its literal meaning. Since I use the terms idiom and idiomaticity in their broad senses of phraseological unit and irregularity, respectively, the qualification *semantic* idiom(aticity) is needed.

One challenge of the modeling of idioms is capturing the relation between the literal and the idiomatic meaning of an expression. Gibbs, Jr. & Colston (2007) give an overview of psycholinguistic research on idioms. Whereas it was first assumed that speakers would compute the literal meaning of an expression and then derive the idiomatic meaning, evidence has been accumulated that the idiomatic meaning is accessed directly.

Wasow, Nunberg & Sag (1984) and Nunberg, Sag & Wasow (1994) explore various semantic relations for idioms, in particular *decomposability* and *transparency*. An idiom is *decomposable* if its idiomatic meaning can be distributed over its component parts in such a way that we would arrive at the idiomatic meaning of the overall expression if we interpreted the syntactic structure on the basis of such a meaning assignment. The idiomatic meaning of the expression *pull strings* can be decomposed by interpreting *pull* as *exploit/use* and *strings* as *connections*. The expressions *kick the bucket* and *saw logs* 'snore' are not decomposable.

An idiom is *transparent* if there is a synchronically accessible relation between the literal and the idiomatic meaning of an idiom. For some speakers, *saw logs* is transparent in this sense, as the noise produced by this activity is similar to a snoring noise. For *pull strings*, there is an analogy to a puppeteer controlling the puppets' behavior by pulling strings. A non-transparent idiom is called *opaque*.

<sup>8</sup> I follow Culicover & Jackendoff (2005: 3) in using the term *Mainstream Generative Grammar* to refer to work in Minimalism and the earlier Government & Binding framework.

<sup>9</sup>See the references in Corver et al. (2019) for a brief up-to-date overview of MGG work.

### 17 Idioms

Some idioms do not show semantic idiomaticity at all, such as collocations (*brush one's teeth*) or support verb constructions (*take a shower*). Many bodypart expressions such as *shake hands* 'greet' or *shake one's head* 'decline/negate' constitute a more complex case: they describe a conventionalized activity and denote the social meaning of this activity.<sup>10</sup>

In addition, we might need to assume a *figurative* interpretation. For some expressions, in particular proverbs or cases like *take the bull by the horns* 'approach a problem directly' we might get a figurative reading rather than an idiomatic reading. Glucksberg (2001) explicitly distinguishes between idiomatic and figurative interpretations. In his view, the above-mentioned case of *shatter the ice* would be a figurative use of the idiom *break the ice*. While there has been a considerable amount of work on figurativity in psycholinguistics, the integration of its results into formal linguistics is still a desideratum.

*Pragmatic idiomaticity* covers expressions that have a *pragmatic point* in the terminology of Fillmore et al. (1988). These include complex formulaic expressions (*Good morning!*). There has been little work on this aspect of idiomaticity in formal phraseology.

The final type of idiomaticity is *statistical idiomaticity*. Contrary to the other idiomaticity criteria, this is a usage-based aspect. If we find a high degree of co-occurrence of a particular combination of words that is idiosyncratic for this combination, we can speak of a statistical idiomaticity. This category includes*collocations*. Baldwin & Kim (2010) mention *immaculate performance* as an example. Collocations are important in computational linguistics and in foreign-language learning, but their status for theoretical linguistics and for a competence-oriented framework such as HPSG is unclear.

This discussion of the various types of idiomaticity shows that idioms do not form a homogeneous empirical domain but rather are defined negatively. This leads to the basic analytical challenge of idioms: while the empirical domain is defined by absence of regularity in at least one aspect, idioms largely obey the principles of grammar. In other words, there is a lot of regularity in the domain of idioms, while any approach still needs to be able to model the irregular properties.

# **3 Predecessors to HPSG analyses of idioms**

In this section, I will sketch the theoretical environment within which HPSG and HPSG analyses of idioms have emerged.

<sup>10</sup>The basic reference for the phraseological properties of body-part expressions is Burger (1976).

### Manfred Sailer

The general assumption about idioms in MGG is that they must be represented as a complex phrasal form-meaning unit. Such units are inserted *en bloc* into the structure rather than built by syntactic operations. This view goes back to Chomsky (1965: 190). With this unquestioned assumption, arguments for or against particular analyses can be constructed. To give just one classical example, Chomsky (1981: 85) uses the passivizability of some idioms as an argument for the existence of Deep Structure, i.e., a structure on which the idiom is inserted holistically. Ruwet (1991) and Nunberg et al. (1994) go through a number of such lines of argumentation showing their basic problems.

The holistic view on idioms is most plausible for idioms that show many types of idiomaticity at the same time, though it becomes more and more problematic if only one or a few types of idiomaticity are attested. HPSG is less driven by analytical pre-decisions than other frameworks; see Borsley & Müller (2021: Section 2.1), Chapter 28 of this volume. Nonetheless, idioms have been used to motivate assumptions about the architecture of linguistic signs in HPSG as well.

Wasow et al. (1984) and Nunberg et al. (1994) are probably the two most influential papers in formal phraseology in the last decades. While there are many aspects of Nunberg et al. (1994) that have not been integrated into the formal modeling of idioms, there are at least two insights that have been widely adapted in HPSG. First, not all idioms should be represented holistically. Second, the syntactic flexibility of an idiom is related to its semantic decomposability. In fact, Nunberg et al. (1994) state this last insight even more generally:<sup>11</sup>

We predict that the syntactic flexibility of a particular idiom will ultimately be explained in terms of the compatibility of its semantics with the semantics and pragmatics of various constructions. (Nunberg et al. 1994: 531)

Wasow et al. (1984) and Nunberg et al. (1994) propose a simplified first approach to a theory that would be in line with this quote. They argue that, for English, there is a correlation between syntactic flexibility and semantic decomposability in that non-decomposable idioms are only semi-fixed, whereas decomposable idioms are flexible, to use our terminology from Section 2. This idea has been directly encoded formally in the idiom theory of Gazdar, Klein, Pullum & Sag (1985: Chapter 7), who define the framework of *Generalized Phrase Structure Grammar* (GPSG).

Gazdar et al. (1985) assume that non-decomposable idioms are inserted into sentences *en bloc*, i.e., as fully specified syntactic trees which are assigned the idiomatic meaning holistically. This means that the otherwise strictly contextfree grammar of GPSG needs to be expanded by adding a (small) set of larger

<sup>11</sup>Aspects of this approach are already present in Higgins (1974) and Newmeyer (1974).

### 17 Idioms

trees. Since non-decomposable idioms are inserted as units, their parts cannot be accessed for syntactic operations such as passivization or movement. Consequently, the generalization about semantic non-decomposability and syntactic fixedness of English idioms from Wasow et al. (1984) is implemented directly.

Decomposable idioms are analyzed as free combinations in syntax. The idiomaticity of such expressions is achieved by two assumptions: First, there is lexical ambiguity, i.e., for an idiom like *pull strings*, the verb *pull* has both a literal meaning and an idiomatic meaning. Similarly for *strings*. Second, Gazdar et al. (1985) assume that lexical items are not necessarily translated into total functions but can be partial functions. Whereas the literal meaning of *pull* might be a total function, the idiomatic meaning of the word would be a partial function that is only defined on elements that are in the denotation of the idiomatic meaning of *strings*. This analysis predicts syntactic flexibility for decomposable idioms, just as proposed in Wasow et al. (1984).

Nunberg et al. (1994: 511–514) show that the connection between semantic decomposability and syntactic flexibility is not as straightforward as suggested. They say that, in German and Dutch, "noncompositional idioms are syntactically versatile" (Nunberg et al. 1994: 514). Similar observations have been brought forward for French in Ruwet (1991). Bargmann & Sailer (2018) and Fellbaum (2019) argue that even for English, passive examples are attested for non-decomposable idioms such as (3).

(3) Live life to the fullest, you never know when the bucket will be kicked.<sup>12</sup>

The current state of our knowledge of the relation between syntactic and semantic idiosyncrasy is that the semantic idiomaticity of an idiom does have an effect on its syntactic flexibility, though the relation is less direct than assumed in the literature based on Wasow et al. (1984) and Nunberg et al. (1994).

# **4 HPSG analyses of idioms**

HPSG does not make a core-periphery distinction; see Müller (2014). Consequently, idioms belong to the empirical domain to be covered by an HPSG grammar. Nonetheless, idioms are not discussed in Pollard & Sag (1994) and their architecture of grammar does not have a direct place for an analysis of idioms.<sup>13</sup> They situate all idiosyncrasy in the lexicon, which consists of lexical entries for

<sup>12</sup>Fellbaum (2019: 756)

<sup>13</sup>This section follows the basic structure and argument of Sailer (2012) and Richter & Sailer (2014).

### Manfred Sailer

basic words. Every word has to satisfy a lexical entry and all principles of grammar; see Davis & Koenig (2021), Chapter 4 of this volume.<sup>14</sup> All properties of a phrase can be inferred from the properties of the lexical items occurring in the phrase and the constraints of grammar.

In their grammar, Pollard & Sag (1994) adhere to the *Strong Locality Hypothesis* (SLH), i.e., all lexical entries describe leaf nodes in a syntactic structure and all phrases are constrained by principles that only refer to local (i.e., *synsem*) properties of the phrase and to local properties of its immediate daughters. This hypothesis is summarized in (4).

(4) Strong Locality Hyphothesis (SLH)

The rules and principles of grammar are statements on a single node of a linguistic structure or on nodes that are immediately dominated by that node.

This precludes any purely phrasal approaches to idioms. Following the heritage of GPSG, we would assume that all regular aspects of linguistic expressions can be handled by mechanisms that follow the SLH, whereas idiomaticity would be a range of phenomena that may violate it. It is, therefore, remarkable that a grammar framework that denies a core-periphery distinction would start with a strong assumption of locality, and, consequently, of regularity.

This is in sharp contrast to the basic motivation of Construction Grammar, which assumes that constructions can be of arbitrary depth and of an arbitrary degree of idiosyncrasy. Fillmore et al. (1988) use idiom data and the various types of idiosyncrasy discussed in Section 2 as an important motivation for this assumption. To contrast this position clearly with the one taken in Pollard & Sag (1994), I will state the *Strong Non-locality Hypothesis* (SNH) in (5).

(5) Strong Non-locality Hypothesis (SNH)

The internal structure of a construction can be arbitrarily deep and show an arbitrary degree of irregularity at any substructure.

The actual formalism used in Pollard & Sag (1994) and King (1989) – see Richter (2021), Chapter 3 of this volume – does not require the strong versions of the locality and the non-locality hypotheses, but is compatible with weaker versions.

<sup>14</sup>I refer to the lexicon in the technical sense as the collection of lexical entries, i.e., as *descriptions*, rather than as a collection of lexical items, i.e., linguistic signs. Since Pollard & Sag (1994) do not discuss morphological processes, their lexical entries describe full forms. If there is a finite number of such lexical entries, the lexicon can be expressed by a *Word Principle*, a constraint on words that contains a disjunction of all such lexical entries. Once we include morphology, lexical rules, and idiosyncratic, lexicalized phrases in the picture, we need to refine this simplified view.

### 17 Idioms

I will call these the *Weak Locality Hypothesis* (WLH), and the *Weak Non-locality Hypothesis* (WNH); see (6) and (7) respectively.

(6) Weak Locality Hypothesis (WLH) At most the highest node in a structure is licensed by a rule of grammar or a lexical entry.

According to the WLH, just as in the SLH, each sign needs to be licensed by the lexicon and/or the grammar. This precludes any *en bloc*-insertion analyses, which would be compatible with the SNH. According to the WNH, in line with the SLH, a sign can, however, impose further constraints on its component parts, that may go beyond local (i.e., *synsem*) properties of its immediate daughters.

(7) Weak Non-locality Hypothesis (WNH)

The rules and principles of grammar can constrain – though not license – the internal structure of a linguistic sign at arbitrary depth.

This means that all substructures of a syntactic node need to be licensed by the grammar, but the node may impose idiosyncratic constraints on which particular well-formed substructures it may contain.

In this section, I will review four types of analyses developed within HPSG in a mildly chronological order: First, I will discuss a conservative extension of Pollard & Sag (1994) for idioms (Krenn & Erbach 1994) that sticks to the SLH. Then, I will look at attempts to incorporate constructional ideas more directly, i.e., ways to include a version of the SNH. The third type of approach will exploit the WLH. Finally, I will summarize recent approaches, which are, again, emphasizing the locality of idioms.

# **4.1 Early lexical approaches**

Krenn & Erbach (1994), based on Erbach (1992), present the first comprehensive HPSG account of idioms. They look at a wide variety of different types of German idioms, including support verb constructions. They only modify the architecture of Pollard & Sag (1994) marginally and stick to the Strong Locality Hypothesis. They base their analysis on the apparent correlation between syntactic flexibility and semantic decomposability from Wasow et al. (1984) and Nunberg et al. (1994). Their analysis is a representational variant of the analysis in Gazdar et al. (1985).

To maintain the SLH, Krenn & Erbach (1994) assume that the information available in syntactic selection is slightly richer than what has been assumed in Pollard & Sag (1994): first, they use a lexeme-identification feature, LEXEME, which is located inside the INDEX value and whose value is the semantic constant associated with a lexeme. Second, they include a feature THETA-ROLE, whose value

### Manfred Sailer

indicates which thematic role a sign is assigned in a structure. In addition to standard thematic roles, they include a dummy value *nil*. Third, as the paper was written in the transition phase between Pollard & Sag (1987) and Pollard & Sag (1994), they assume that the selectional attributes contain complete *sign* objects rather than just *synsem* objects. Consequently, selection for phonological properties and internal constituent structure is possible, which we could consider a violation of the SLH.

The effect of these changes in the analysis of idioms can be seen in (8) and (10). In (8), I sketch the analysis of the syntactically flexible, decomposable idiom *spill the beans*. There are individual lexical items for the idiomatic words.<sup>15</sup>

> 

$$\begin{array}{ll} \text{(8)} & \text{a.} \begin{bmatrix} \text{pHON} \left< spill \right>\\ \text{sYNSSEM} \left| \text{LOC} \left[ \begin{matrix} \text{CAT} & \begin{matrix} \text{SUBJ} & \begin{matrix} \text{NP} \end{matrix} \end{matrix} \right] \end{bmatrix} \end{array} \right] \\\\ & \begin{bmatrix} \text{pHON} \left< \begin{matrix} \text{REL} \left< spill \\_l \right> \end{matrix} \right> \\\\ & \begin{bmatrix} \text{pHON} \left< beans \right>\\ \text{SYNSEM} \left| \text{LOC} \left| \text{CONEX} \left| \text{LEX} \left| \text{LEX} \left| \text{LEX} \right. \end{matrix} \right. \end{array} \right] \end{array}$$

The LEXEME values of these words can be used to distinguish them from their ordinary, non-idiomatic homonyms. Each idiomatic word comes with its idiomatic meaning, which models the decomposability of the expression. For example, the lexical items satisfying the entry in (8a) can undergo lexical rules such as passivization.

The idiomatic verb *spill* selects an NP complement with the LEXEME value *beans\_i*. The lexicon is built in such a way that no other word selects for this LEXEME value. This models the lexical fixedness of the idiom.

The choice of putting the lexical identifier into the INDEX guarantees that it is shared between a lexical head and its phrase, which allows for syntactic flexibility inside the NP. Similarly, the information shared between a trace and its antecedent contains the INDEX value. Consequently, participation in unbounded dependency constructions is equally accounted for. Finally, since a pronoun has the same INDEX value as its antecedent, pronominalization as in (9) is also possible.

(9) Eventually, she spilled all the beans . But it took her a few days to spill them all.<sup>16</sup>

<sup>15</sup>We do not need to specify the REL value for the noun *beans*, as the LEXEME and the REL value are usually identical.

<sup>16</sup>Riehemann (2001: 207)

### 17 Idioms

I sketch the analysis of a non-decomposable, fixed idiom, *kick the bucket*, in (10). In this case, there is only a lexical entry of the syntactic head of the idiom, the verb *kick*. It selects the full phonology of its complement. This blocks any syntactic processes inside this NP. It also follows that the complement cannot be realized as a trace, which blocks extraction.<sup>17</sup> The special THETA-ROLE value *nil* will be used to restrict the lexical rules that can be applied. The passive lexical rule, for example, would be specified in such a way that it cannot apply if the NP complement in its input has this theta-role.

$$\begin{array}{c} \text{(10)} \quad \begin{bmatrix} \text{PHON} \left< \text{kick} \right>\\\\ \text{SYNSEM}|\text{LOC} \end{bmatrix} \begin{bmatrix} \text{SUBJ} & \text{\{NP\}}\\ \text{COMPS} \left< \text{NP\} \left< \begin{array}{l} \text{PPON} \left< \text{\{PPON} \left< \text{\{the\\_bucket}\} \right>}\\ \text{COMPS} \left< \text{NP\} \left< \text{\{PHTA-ROLE\\_nil} \right>} \right> \right> \end{bmatrix} \right> \end{array} \right> \\\\ \begin{bmatrix} \text{\{PLT\\_inf} \left[ \text{RE\\_idie} \right] \end{bmatrix} \end{array} \tag{11}$$

With this analysis, Krenn & Erbach (1994) capture both the idiosyncratic aspects and the regularity of idioms. They show how it generalizes to a wide range of idiom types. I will briefly mention some problems of the approach, though.

There are two problems for the analysis of non-decomposable idioms. First, the approach is too restrictive with respect to the syntactic flexibility of *kick the bucket*, as it excludes cases such as *kick the social/figurative bucket*, which are discussed in Ernst (1981). Second, it is built on equating the class of nondecomposable idioms with that of semi-fixed idioms. As shown in my discussion around example (3), this cannot be maintained.

There are also some undesired properties of the LEXEME value selection. The index identity between a pronoun and its antecedent would require that the subject of the relative clause in (11) has the same INDEX value as the head noun *strings*. However, the account of the lexical fixedness of idioms is built on the assumption that no verb except for the idiomatic *pull* selects for an argument with LEXEME value *strings\_i*. 18

(11) Parky pulled the strings that got me the job. (McCawley 1981: 137)

Notwithstanding these problems, the analytic ingredients of Krenn & Erbach (1994) constitute the basis of later HPSG analyses. In particular, a mechanism for lexeme-specific selection has been widely assumed in most approaches. The

<sup>17</sup>See Borsley & Crysmann (2021), Chapter 13 of this volume for details on the treatment of extraction in HPSG.

<sup>18</sup>Pulman (1993) discusses an analogous problem for the denotational theory of Gazdar et al. (1985).

### Manfred Sailer

attribute THETA-ROLE can be seen as a simple form of an *inside-out* mechanism, i.e., as a mechanism of encoding information about the larger structure within which a sign appears.

# **4.2 Phrasal approach**

With the advent of constructional analyses within HPSG, starting with Sag (1997), it is natural to expect phrasal accounts of idioms to emerge as well, as idiomaticity is a central empirical domain for Construction Grammar; see Müller (2021), Chapter 32 of this volume. In this version of HPSG, there is an elaborate type hierarchy below *phrase*. Sag (1997) also introduces *defaults* into HPSG, which play an important role in the treatment of idioms in Constructional HPSG. The clearest phrasal approach to idioms can be found in Riehemann (2001), which incorporates insights from earlier publications such as Riehemann (1997) and Riehemann & Bender (1999). The overall framework of Riehemann (2001) is Constructional HPSG with *Minimal Recursion Semantics* (Copestake et al. 1995; 2005); see also Koenig & Richter (2021: Section 6.1), Chapter 22 of this volume.

For Riehemann, idioms are phrasal units. Consequently, she assumes a subtype of *phrase* for each idiom, such as *spill-beans-idiomatic-phrase* or *kick-bucketidiomatic-phrase*. The proposal in Riehemann (2001) simultaneously is phrasal and obeys the SLH. To achieve this, Riehemann (2001) assumes an attribute WORDS, whose value contains all words dominated by a phrase. This makes it possible to say that a phrase of type *spill-beans-idiomatic-phrase* dominates the words *spill* and *beans*. This is shown in the relevant type constraint for the idiom *spill the beans* in (12).<sup>19</sup>

(12) Constraint on the type *spill-beans-idiomatic-phrase* from Riehemann (2001: 185):

 *spill-beans-ip* WORDS *i\_spill* … LISZT *i\_spill\_rel* UNDERGOER 1 *<* u - … LISZT *\_spill\_rel*, *i\_beans* … LISZT *i\_beans\_rel* INST 1 *<* u - … LISZT *\_beans\_rel*, … 

The WORDS value of the idiomatic phrase contains at least two elements, the idiomatic words of type *i\_spill* and *i\_beans*. The special symbol *<sup>&</sup>lt;* u used in this

<sup>19</sup>The percolation mechanism for the feature WORDS is rather complex. The fact that entire words are percolated undermines the locality intuition behind the SLH.

### 17 Idioms

constraint expresses a default. It says that the idiomatic version of the word *spill* is just like its non-idiomatic homonym, except for the parts specified in the left-hand side of the default. In this case, the type of the words and the type of the semantic predicate contributed by the words are changed. Riehemann (2001) only has to introduce the types for the idiomatic words in the type hierarchy but need not specify type constraints on the individual idiomatic words, as these are constrained by the default statement within the constraints on the idioms containing them.

As in the account of Krenn & Erbach (1994), the syntactic flexibility of the idiom follows from its free syntactic combination and the fact that all parts of the idiom are assigned an independent semantic contribution. The lexical fixedness is a consequence of the requirement that particular words are dominated by the phrase, namely the idiomatic versions of *spill* and *beans*.

The appeal of the account is particularly clear in its application to non-decomposable, semi-fixed idioms such as *kick the bucket* (Riehemann 2001: 212). For such expressions, the idiomatic words that constitute them are assumed to have an empty semantics and the meaning of the idiom is contributed as a constructional semantic contribution only by the idiomatic phrase. Since the WORDS list contains entire words, it is also possible to require that the idiomatic word *kick* be in active voice and/or that it take a complement compatible with the description of the idiomatic word *bucket*. This analysis captures the syntactically regular internal structure of this type of idioms and is compatible with the occurrence of modifiers such as *proverbial*. At the same time, it prevents passivization and excludes extraction of the complement as the SYNSEM value of the idiomatic word *bucket* must be on the COMPS list of the idiomatic word *kick*. 20

Riehemann's approach clearly captures the intuition of idioms as phrasal units much better than any other approach in HPSG. However, it faces a number of problems. First, the integration of the approach with Constructional HPSG is done in such a way that the phrasal types for idioms are cross-classified in complex type hierarchies with the various syntactic constructions in which the idiom can appear. This allows Riehemann to account for idiosyncratic differences in the syntactic flexibility of idioms, but the question is whether such an explicit encoding misses generalizations that should follow from independent properties of the components of an idiom and/or of the syntactic construction – in line with the quote from Nunberg et al. (1994) on page 782.

Second, the mechanism of percolating dominated words to each phrase is not compatible with the intuitions of most HPSG researchers. Since no empirical

<sup>20</sup>This assumes that extracted elements are not members of the valence lists. See Borsley & Crysmann (2021: 544), Chapter 13 of this volume for details.

### Manfred Sailer

motivation for such a mechanism aside from idioms is provided in Riehemann (2001), this idea has not been pursued in other papers.

Third, the question of how to block the free occurrence of idiomatic words, i.e., the occurrence of an idiomatic word without the rest of the idiom, is not solved in Riehemann (2001). While the idiom requires the presence of particular idiomatic words, the occurrence of these words is not restricted.<sup>21</sup> Note that idiomatic words may sometimes be found without the other elements of the idiom – evidenced by expressions such as in *bucket list* 'list of things to do before one dies'. Such data may be considered as support of Riehemann's approach; however, the extent to which we find such free occurrences of idiomatic words is extremely small.<sup>22</sup>

Before closing this subsection, I would like to point out that Riehemann (2001) and Riehemann & Bender (1999) are the only HPSG papers on idioms that address the question of statistical idiomaticity, based on the variationist study in Bender (2001). In particular, Riehemann (2001: 297–301) proposes phrasal constructions for collocations even if these do not show any lexical, syntactic, semantic, or pragmatic idiosyncrasy but just a statistical co-occurrence preference. She extends this into a larger plea for an *experience-based HPSG*. Bender (2001) discusses the same idea under the notions of *minimal* versus *maximal* grammars, i.e., grammars that are as free of redundancy as possible to capture the grammatical sentences of a language with their correct meaning versus grammars that might be open to a connection with usage-based approaches to language modeling. Bender (2001: 292) sketches a version of HPSG with frequencies/probabilities attached to lexical and phrasal types.<sup>23</sup>

<sup>21</sup>Since the problem of free occurrences of idiomatic words is not an issue for parsing, versions of Riehemann's approach have been integrated into practical parsing systems (Villavicencio & Copestake 2002); see Bender & Emerson (2021), Chapter 25 of this volume. Similarly, the approach to idioms sketched in Flickinger (2015) is part of a system for parsing and machine translation. Idioms in the source language are identified by bits of semantic representation – analogous to the elements in the WORDS set. This approach, however, does not constitute a theoretical modeling of idioms; it does not exclude ill-formed uses of idioms but identifies

potential occurrences of an idiom in the output of a parser.

<sup>22</sup>See the discussion around (1) for a parallel situation with bound words.

<sup>23</sup>An as-yet unexplored solution to the problem of free occurrence of idiomatic words within an experience-based version of HPSG could be to assign the type *idiomatic-word* an extremely low probability of occurring. This might have the effect that such a word can only be used if it is explicitly required in a construction. However, note that neither defaults nor probabilities are well-defined part of the formal foundations of theoretical work on HPSG; see Richter (2021), Chapter 3 of this volume.

### 17 Idioms

# **4.3 Mixed lexical and phrasal approaches**

While Riehemann (2001) proposes a parallel treatment of decomposable and nondecomposable idioms – and of flexible and semi-fixed idioms – the division between fixed and non-fixed expressions is at the core of another approach, the *twodimensional theory of idioms*. This approach was first outlined in Sailer (2000) and referred to under this label in Richter & Sailer (2009; 2014). It is intended to combine constructional and collocational approaches to grammar.

The basic intuition behind this approach is that signs have internal and external properties. All properties that are part of the feature structure of a sign are called *internal*. Properties that relate to larger feature structures containing this sign are called its *external* properties. The approach assumes that there is a notion of *regularity* and that anything diverging from it is *idiosyncratic* – or idiomatic, in the terminology of this chapter.

This approach is another attempt to reify the GPSG analysis within HPSG. Sailer (2000) follows the distinction of Nunberg et al. (1994) into non-decomposable and non-flexible idioms on the one hand and decomposable and flexible idioms on the other. The first group is considered internally irregular and receives a constructional analysis in terms of a *phrasal lexical entry*. The second group is considered to consist of independent, smaller lexical units that show an external irregularity in being constrained to co-occur within a larger structure. Idioms of the second group receive a collocational analysis. The two types of irregularity are connected by the *Predictability Hypothesis*, given in (13).

(13) Predictability Hypothesis (Sailer 2000: 366):

For every sign whose internal properties are fully predictable, the distributional behavior of this sign is fully predictable as well.

In the most recent version of this approach, Richter & Sailer (2009; 2014), there is a feature COLL defined on all signs. The value of this feature specifies the type of internal irregularity. The authors assume a cross-classification of regularity and irregularity with respect to syntax, semantics, and phonology – ignoring pragmatic and statistical (ir)regularity in their paper. Every basic lexical entry is defined as completely irregular, as its properties are not predictable. Fully regular phrases such as *read a book* have a trivial value of COLL. A syntactically internally regular but fixed idiom such as *kick the bucket* is classified as having only semantic irregularity, whereas a syntactically irregular expression such as *trip the light fantastic* is of an irregularity type that is a subsort of syntactic and semantic irregularity, but not of phonological irregularity. Following the termi-

#### Manfred Sailer

nology of Fillmore et al. (1988), this type is called *extra-grammatical-idiom*. The phrasal lexical entry for *trip the light fantastic* is sketched in (14), adjusted to the feature geometry of Sag (1997).

(14) Phrasal lexical entry for the idiom *trip the light fantastic*: *headed-phrase* PHON 1 ⊕ *the, light, fantastic* HEAD 2 

In (14), the constituent structure of the phrase is not specified, but the phonology is fixed, with the exception of the head daughter's phonological contribution. This accounts for the syntactic irregularity of the idiom. The semantics of the idiom is not related to the semantic contributions of its components, which accounts for the semantic idiomaticity.

Soehn (2006) applies this theory to German. He solves the problem of the relatively large degree of flexibility of non-decomposable idioms in German by using underspecified descriptions of the constituent structure dominated by the idiomatic phrase.

For decomposable idioms, the two-dimensional theory assumes a collocational component. This component is integrated into the value of an attribute REQ, which is only defined on *coll* objects of one of the irregularity types. This encodes the Predictability Hypothesis. The most comprehensive version of this collocational theory is given in Soehn (2009), summarizing and extending ideas from Soehn (2006) and Richter & Soehn (2006). Soehn assumes that collocational requirements can be of various types: a lexical item can be constrained to co-occur with particular *licensers* (or collocates). These can be other lexemes, semantic operators, or phonological units. In addition, the domain within which this li-

### 17 Idioms

censing has to be satisfied is specified in terms of syntactic barriers, i.e., syntactic nodes dominating the externally irregular item.

To give an example, the idiom *spill the beans* would be analyzed as consisting of two idiomatic words *spill* and *beans* with special LISTEME values *spill-i* and *beans-i*. The idiomatic verb *spill* imposes a lexeme selection on its complement. The idiomatic noun *beans* has a non-empty REQ value, which specifies that it must be selected by a word with LISTEME value *spill-i* within the smallest complete clause dominating it.

The two-dimensional approach suffers from a number of weaknesses. First, it presupposes a notion of regularity. This assumption is not shared by all linguists. Second, the criteria for whether an expression should be treated constructionally or collocationally are not always clear. Idioms with irregular syntactic structure need to be analyzed constructionally, but this is less clear for non-decomposable idioms with regular syntactic structure such as *kick the bucket*.

# **4.4 Recent lexical approaches**

Kay et al. (2015) marks an important re-orientation in the analysis of idioms: the lexical analysis is extended to all syntactically regular idioms, i.e., to both decomposable (*spill the beans*) and non-decomposable idioms (*kick the bucket*).<sup>24</sup> Kay et al. (2015) achieve a lexical analysis of non-decomposable idioms by two means: (i), an extension of the HPSG selection mechanism, and (ii), the assumption of semantically empty idiomatic words.

As in previous accounts, the relation among idiom parts is established through lexeme-specific selection, using a feature LID (for *lexical identifier*). The authors assume that there is a difference between idiomatic and non-idiomatic LID values. Only heads that are part of idioms themselves can select for idiomatic words.

For the idiom *kick the bucket*, Kay et al. (2015) assume that all meaning is carried by the lexical head, an idiomatic version of *kick*, whereas the other two words, *the* and *bucket* are meaningless. This meaninglessness allows Kay et al. to block the idiom from occurring in constructions which require meaningful constituents, such as questions, *it*-clefts, middle voice, and others. To exclude passivization, the authors assume that the English passive cannot apply to verbs selecting a semantically empty direct object.

The approach in Kay et al. (2015) is a recent attempt to maintain the SLH as much as possible. Since the SLH has been a major conceptual motivation for Sign-Based Construction Grammar, Kay et al.'s paper is an important contribution showing the empirical robustness of this assumption.

<sup>24</sup>This idea has been previously expressed within a Minimalist perspective by Everaert (2010) and G. Müller (2011: 213–214).

### Manfred Sailer

Bargmann & Sailer (2018) propose a similar lexical approach to non-decomposable idioms. They take as their starting point the syntactic flexibility of semantically non-decomposable idioms in English and, in particular, in German. There are two main differences between Kay et al.'s paper and Bargmann & Sailer's: (i), Bargmann & Sailer assume a collocational rather than a purely selectional mechanism to capture lexeme restrictions of idioms, and (ii), they propose a redundant semantics rather than an empty semantics for idiom parts in non-decomposable idioms. In other words, Bargmann & Sailer (2018) propose that both *kick* and *bucket* contribute the semantics of the idiom *kick the bucket*. Bargmann & Sailer argue that the semantic contributions of parts of non-decomposable, syntactically regular idioms are the same across languages, whereas the differences in syntactic flexibility are related to the different syntactic, semantic, and pragmatic constraints imposed on various constructions. To give just one example, while there are barely any restrictions on passive subjects in German, there are strong discourse-structural constraints on passive subjects in English.

Both Kay et al. (2015) and Bargmann & Sailer (2018) attempt to derive the (partial) syntactic inflexibility of non-decomposable idioms from independent properties of the relevant constructions. As such, they subscribe to the programmatic statement of Nunberg et al. (1994) quoted on page 782. In this respect, the extension of the lexical approach from decomposable idioms to all syntactically regular expressions has been a clear step forward.

Findlay (2017) provides a recent discussion and criticism of lexical approaches to idioms in general, which applies in particular to non-decomposable expressions. His reservations comprise the following points. First, there is a massive proliferation of lexical entries for otherwise homophonous words. Second, the lexical analysis does not represent idioms as units, which might make it difficult to connect their theoretical treatment with processing evidence. Findlay refers to psycholinguistic studies, such as Swinney & Cutler (1979), that point to a faster processing of idioms than of free combinations. While the relevance of processing arguments for an HPSG analysis are not clear, I share the basic intuition that idioms, decomposable or not, are a unit and that this should be part of their linguistic representation.

# **5 Where to go from here?**

The final section of this article contains short overviews of research that has been done in areas of phraseology that are outside the main thread of this chapter. I will also identify desiderata.

### 17 Idioms

# **5.1 Neglected phenomena**

Not all types of idioms or idiomaticity mentioned in Section 2 have received an adequate treatment in the (HPSG) literature. I will briefly look at three empirical areas that deserve more attention: neglected types of idiom variation, phraseological patterns, and the literal and non-literal meaning components of idioms.

Most studies on idiom variation have looked at verb- and sentence-related syntactic constructions, such as passive and topicalization. However, not much attention has been paid to lexical variation in idioms. This variation is illustrated by the following examples from Richards (2001: 184, 191).

	- b. You get the creeps (just looking at him).
	- c. I have the creeps.

In (15), the alternation of the verb seems to be very systematic – and has been used by Richards (2001) to motivate a lexical decomposition of the involved verbs. A similar argument has been made in Mateu & Espinal (2007) for similar idioms in Catalan. We are lacking systematic, larger empirical studies of this type of substitution, and it would be important to see how it can be modeled in HPSG. One option would be to capture the *give*–*get*–*have* alternation(s) with lexical rules. Such lexical rules would be different from the standard cases, however, as they would change the lexeme itself rather than just alternating its morphosyntactic properties or its semantic contribution.

In the case mentioned in footnote 6, the alternation consists of substituting a word with a (near) synonym and keeping the meaning of the idiom intact. Again, HPSG seems to have all the required tools to model this phenomenon – for example, by means of hierarchies of LID values. However, the extent of this phenomenon across the set of idioms is not known empirically.

Concerning syntactic variation, the nominal domain has not yet received the attention it might deserve. There is a well-known variation with respect to the marking of possession within idioms. This has been documented for English in Ho (2015), for Modern Hebrew in Almog (2012), and for Modern Greek and German in Markantonatou & Sailer (2016). In German, we find a relatively free alternation between a plain definite and a possessive; see (16a). This is, however, not possible with all idioms; see (16b).

(16) a. Alex Alex hat has den the / seinen his Verstand mind verloren. lost 'Alex lost his mind.'

### Manfred Sailer

b. Alex Alex hat has \*den the / ihren her Frieden peace mit with der the Situation situation gemacht. made 'Alex made her peace with the situation.'

We can also find a free dative in some cases, expressing the possessor. In (17a), a dative possessor may co-occur with a plain definite or a coreferential possessive determiner; in (17b), only the definite article but not the possessive determiner is possible.

	- b. Alex Alex sollte should mir me.DAT lieber rather aus out of den the / \*meinen my Augen eyes gehen. go 'Alex should rather disappear from my sight.'

While they do not offer a formal encoding, Markantonatou & Sailer (2016) observe that a particular encoding of possession in idioms is only possible if it would also be possible in a free combination. However, an idiom may be idiosyncratically restricted to a subset of the realizations that would be possible in a corresponding free combination. A formalization in HPSG might consist of a treatment of possessively used definite determiners, combined with an analysis of free datives as an extension of a verb's argument structure.<sup>25</sup>

Related to the question of lexical variation are *phraseological patterns*, i.e., very schematic idioms in which the lexical material is largely free. Some examples of phraseological patterns are the *Incredulity Response Construction* as in *What, me worry?* (Akmajian 1984; Lambrecht 1990), or the *What's X doing Y?* construction (Kay & Fillmore 1999). Such patterns are of theoretical importance as they typically involve a non-canonical syntactic pattern. The different locality and non-locality hypotheses introduced above make different predictions. Fillmore et al. (1988) have presented such constructions as a motivation for the non-locality of constructions, i.e., as support of a SNH. However, Kay & Fillmore (1999) show that a lexical analysis might be possible for some cases at least, which they illustrate with the *What's X doing Y?* construction.

Borsley (2004) looks at another phraseological pattern, the *the X-er the Y-er* construction, or *Comparative Correlative Construction* – see Abeillé & Chaves

<sup>25</sup>See Koenig (1999) for an analysis of possessively interpreted definites and Müller (2018: 68) for an extension of the argument structure as suggested in the main text.

### 17 Idioms

(2021: Section 3.3), Chapter 16 of this volume and Borsley & Crysmann (2021: 553–555), Chapter 13 of this volume. Borsley analyzes this construction by means of two special (local) phrase structure types: one for the comparative *the*-clauses, and one for the overall construction. He shows that (i), the idiosyncrasy of the construction concerns two levels of embedding and is, therefore, non-local; however, (ii), a local analysis is still possible. This approach raises the question of whether the WNH is empirically vacuous since we can always encode a non-local construction in terms of a series of idiosyncratic local constructions. Clearly, work on more phraseological patterns is needed to assess the various analytical options and their consequences for the architecture of grammar.

A major charge for the conceptual and semantic analysis of idioms is the interaction between the literal and the idiomatic meaning. I presented the basic empirical facts in Section 2. All HPSG approaches to idioms so far basically ignore the literal meaning. This position might be justified, as an HPSG grammar should just model the structure and meaning of an utterance and need not worry about the meta-linguistic relations among different lexical items or among different readings of the same (or a homophonous) expression. Nonetheless, this issue touches on an important conceptual point. Addressing it might immediately provide possibilities to connect HPSG research to other disciplines and/ or frameworks like Cognitive Linguistics, such as in Dobrovol'skij & Piirainen (2005), and psycholinguistics.

# **5.2 Challenges from other languages**

The majority of work on idioms in HPSG has been done on English and German. As discussed in Section 4.4, the recent trend in HPSG idiom research necessitates a detailed study of individual syntactic structures. Consequently, the restriction to two closely related languages limits the possible phenomena that can be studied concerning idioms. It would be essential to expand the empirical coverage of idiom analyses in HPSG to as many different languages as possible. The larger degree of syntactic flexibility of French, German, and Dutch idioms (Ruwet 1991; Nunberg et al. 1994; Schenk 1995) has led to important refinements of the analysis in Nunberg et al. (1994) and, ultimately, to the lexical analyses of all syntactically regular idioms.

Similarly, the above-mentioned data on possessive alternations only become prominent when languages beyond English are taken into account. Modern Greek, German, and many others show the type of external possessor classified as a European areal phenomenon in Haspelmath (1999). It would be important to look at idioms in languages with other types of external possessors.

### Manfred Sailer

In a recent paper, Sheinfux et al. (2019) provide data from Modern Hebrew that show that opacity and figurativity of an idiom are decisive for its syntactic flexibility, rather than decomposability. This result stresses the importance of the literal reading for an adequate account of the syntactic behavior of idioms. It shows that the inclusion of other languages can cause a shift of focus to other types of idioms or other types of idiomaticity.

To add just one more example, HPSG(-related) work on Persian such as Müller (2010) and Samvelian & Faghiri (2016) establishes a clear connection between complex predicates and idioms. Their insights might also lead to a reconsideration of the similarities between light verbs and idioms, as already set out in Krenn & Erbach (1994).

As far as I can see, the following empirical phenomena have not been addressed in HPSG approaches to idioms, as they do not occur in the main object languages for which we have idiom analyses, i.e., English and German. They are, however, common in other languages: the occurrence of clitics in idioms (found in Romance and Greek); aspectual alternations in verbs (Slavic and Greek); argument alternations other than passive and dative alternation, such as anti-passive, causative, inchoative, etc. (in part found in Hebrew and addressed in Sheinfux et al. 2019); and displacement of idiom parts into special syntactic positions (focus position in Hungarian).

Finally, so far, idioms have usually been considered as either offering irregular structures or as being more restricted in their structures than free combinations. In some languages, however, we find archaic syntactic structures and function words in idioms that do not easily fit these two analytic options. To name just a few, Lødrup (2009) argues that Norwegian used to have an external possessor construction similar to that of other European languages, which is only conserved in some idioms. Similarly, Dutch has a number of archaic case inflections in multiword expressions (Kuiper 2018: 129), and there are archaic forms in Modern Greek multiword expressions. It is far from clear what the best way would be to integrate such cases into an HPSG grammar.

# **6 Conclusion**

Idioms are among the topics in linguistics for which HPSG-related publications have had a clear impact on the field and have been widely quoted across frameworks. This handbook article aimed at providing an overview over the development of idiom analyses in HPSG. There seems to be a development towards ever more lexical analyses, starting from the holistic approach for all idioms in Chom-

### 17 Idioms

sky's work, to a lexical account for all syntactically regular expressions. Notwithstanding the advantages of the lexical analyses, I consider it a basic problem of such approaches that the unit status of idioms is lost. Consequently, I think that the right balance between phrasal and lexical aspects in the analysis of idioms has not yet been fully achieved.

The sign-based character of HPSG seems to be particularly suited for a theory of idioms as it allows one to take into consideration syntactic, semantic, and pragmatic aspects and to use them to constrain the occurrence of idioms appropriately.

# **Abbreviations**


# **Acknowledgments**

I have perceived Ivan A. Sag and his work with various colleagues as a major inspiration for my own work on idioms and multiword expressions. This is clearly reflected in the structure of this paper, too. I apologize for this bias, but I think it is legitimate within an HPSG handbook. I am grateful to Jean-Pierre Koenig, Stefan Müller and Tom Wasow for comments on the outline and the first version of this chapter. I would not have been able to format this chapter without the support of the Language Science Press team, in particular Sebastian Nordhoff. I would like to thank Elizabeth Pankratz for comments and proofreading.

### Manfred Sailer

# **Appendix: List of used idioms**

# **English**



# **German**

### Manfred Sailer

# **References**


### 17 Idioms

retical Morphology and Syntax), 537–594. Berlin: Language Science Press. DOI: 10.5281/zenodo.5599842.


### Manfred Sailer

(eds.), *Proceedings of the International Conference on Idioms*, 11–24. Tilburg, The Netherlands: ITK.


### 17 Idioms


### Manfred Sailer

*from a multi-lingual perspective* (Phraseology and Multiword Expressions 1), 121–141. Berlin: Language Science Press. DOI: 10.5281/zenodo.1182595.


### 17 Idioms


### Manfred Sailer

*ternational Conference on Head-Driven Phrase Structure Grammar, Varna*, 421– 440. Stanford, CA: CSLI Publications. http : / / cslipublications . stanford . edu / HPSG/2006/richter-soehn.pdf (10 February, 2021).


### 17 Idioms


# **Chapter 18**

# **Negation**

# Jong-Bok Kim

Kyung Hee University, Seoul

Each language has a way to express (sentential) negation that reverses the truth value of a certain sentence, but employs language-particular expressions and grammatical strategies. There are four main types of negatives in expressing sentential negation: the adverbial negative, the morphological negative, the negative auxiliary verb, and the preverbal negative. This chapter discusses HPSG analyses for these four strategies in marking sentential negation.

# **1 Modes of expressing negation**

There are four main types of negative markers in expressing negation in languages: the morphological negative, the negative auxiliary verb, the adverbial negative, and the clitic-like preverbal negative (see Dahl 1979; Payne 1985; Zanuttini 2001; Dryer 2005).<sup>1</sup> Each of these types is illustrated in the following:


<sup>1</sup>The term *negator* or *negative marker* is a cover term for any linguistic expression functioning as sentential negation.

### Jong-Bok Kim

d. Gianni Gianni non NEG legge reads articoli articles di of sintassi. syntax 'Gianni doesn't read syntax articles.'

As shown in (1a), languages like Turkish have typical examples of morphological negatives where negation is expressed by an inflectional category realized on the verb by affixation. Meanwhile, languages like Korean employ a negative auxiliary verb as in (1b).<sup>2</sup> The negative auxiliary verb here is marked with basic verbal categories such as agreement, tense, aspect, and mood, while the lexical, main verb remains in an invariant, participle form. The third major way of expressing negation is to use an adverbial negative. This type of negation, forming an independent word, is found in languages like English and French, as given in (1c). In these languages, negatives behave like adverbs in their ordering with respect to the verb.<sup>3</sup> The fourth type is to introduce a preverbal negative. The negative marker in Italian in (1d), preceding a finite verb like other types of clitics in the language, belongs to this type.

(Italian)

In analyzing these four main types of sentential negation, there have been two main strands: derivational and non-derivational views. The derivational view has claimed that the positioning of all of the four types of negatives is basically determined by the interaction of movement operations, a rather large set of functional projections including NegP, and their hierarchically fixed organization. In particular, to account for the fact that, unlike English, only French allows main or lexical verb inversion as in (1c), Pollock (1989; 1994) and a number of subsequent researchers have interpreted these contrasts as providing critical motivation for the process of head movement and the existence of functional categories such as MoodP, TP, AgrP, and NegP (see Belletti 1990; Zanuttini 1997; Chomsky 1991; 1993; Lasnik 1995; Haegeman 1995; 1997; Vikner 1997; Zanuttini 2001; Zeijlstra 2015). Within the derivational view, it has thus been widely accepted that the variation between French and English can be explained only in terms of the respective properties of verb movement and its interaction with a view of clause structure organized around functional projections.

Departing from the derivational view, the non-derivational, lexicalist view introduces no uniform syntactic category (e.g., Neg or NegP) for the different types of negatives. This view allows negation to be realized in different grammatical categories, e.g., a morphological suffix, an auxiliary verb, or an adverbial expres-

<sup>2</sup>Korean is peculiar in that it has two ways to express sentential negation: a negative auxiliary (a long form negation) and a morphological negative (a short form negation) for sentential negation. See Kim (2000; 2016) and references therein for details.

<sup>3</sup> In French, the negator *pas* often accompanies the optional preverb clitic *ne*. See Godard (2004) for detailed discussion on the uses of the clitic *ne*.

### 18 Negation

sion. For instance, the negative *not* in English is taken to be an adverb like other negative expressions in English (e.g., *never, barely, hardly*). This view has been suggested by Jackendoff (1972: 343–347), Baker (1991: 401), Ernst (1992), Kim (2000: 91), and Warner (2000: 181). In particular, Kim & Sag (1996), Abeillé & Godard (1997), Kim (2000), and Kim & Sag (2002) develop analyses of sentential negation in English, French, Korean, and Italian within the framework of HPSG, showing that the postulation of Neg and its projection NegP creates more empirical and theoretical problems than it solves (see Newmeyer 2006 for this point). In addition, there has been substantial work on negation in other languages within the HPSG framework, which does not resort to the postulation of functional projections or movement operations to account for the various distributional possibilities of negation (see Przepiórkowski & Kupść 1999; Borsley & Jones 2000; Przepiórkowski 2000; Kupść & Przepiórkowski 2002; de Swart & Sag 2002; Borsley & Jones 2005; Crysmann 2010; Bender & Lascarides 2013).

This chapter reviews the HPSG analyses of these four main types of negation, focusing on the distributional possibilities of these four types of negatives in relation to other main constituents of the sentence.<sup>4</sup> When necessary, the chapter also discusses implications for the theory of grammar. It starts with the HPSG analyses of adverbial negatives in English and French, which have been most extensively studied in Transformational Grammars (Section 2), and then moves to the discussion of morphological negatives (Section 3), negative auxiliary verbs (Section 4), and preverbal negatives (Section 5). The chapter also reviews the HPSG analyses of phenomena like genitive of negation and negative concord which are sensitive to the presence of negative expressions (Section 6). The final section concludes this chapter.

# **2 Adverbial negative**

# **2.1 Two key factors**

The most extensively studied type of negation is the adverbial negative, which we find in English and French. There are two main factors that determine the position of an adverbial negative: the finiteness of the verb and its intrinsic properties, namely whether it is an auxiliary or a lexical verb (see Kim 2000: Chapter 3, Kim & Sag 2002).<sup>5</sup>

<sup>4</sup>This chapter grew out of Kim (2000; 2018).

<sup>5</sup>German also employs an adverbial negative *nicht*, which behaves quite differently from the negative in English and French. See Müller (2016: Section 11.7.1) for a detailed review of the previous theoretical analyses of German negation.

### Jong-Bok Kim

First consider the finiteness of the lexical verb that affects the position of adverbial negatives in English and French. English shows us how the finiteness of a verb influences the surface position of the adverbial negative *not*:

	- b. \* Kim not likes Lee.
	- c. \* Kim likes not Lee.
	- b. \* Kim is believed to [like not Mary].

As seen from the data above, the negation *not* precedes an infinitive, but cannot follow a finite lexical verb (see Baker 1989: Chapter 15, Baker 1991; Ernst 1992). French is not different in this respect. Finiteness also affects the distributional possibilities of the French negative *pas* (see Abeillé & Godard 1997; Kim & Sag 2002; Zeijlstra 2015):

(French)

	- b. \* Robin Robin ne NEG pas NEG aime likes Stacy. Stacy
	- b. \* Ne NEG parler to.speak pas NEG Français French est is un a grand great désavantage disadvantage en in ce this cas. case

The data illustrate that the negator *pas* cannot precede a finite verb, but must follow it. But its placement with respect to the non-finite verb is the reverse image. The negator *pas* should precede an infinitive.

The second important factor that determines the position of adverbial negatives concerns the presence of an auxiliary or a lexical verb. Modern English displays a clear example where this intrinsic property of the verb influences the position of the English negator *not*: the negator cannot follow a finite lexical verb, as in (6a), but when the finite verb is an auxiliary verb, this ordering is possible, as in (6b) and (6c).

18 Negation

	- b. Kim has not left the town.
	- c. Kim is not leaving the town.

The placement of *pas* in French infinitival clauses is also affected by this intrinsic property of the verb (Kim & Sag 2002: 355):

	- b. N' NEG avoir have pas NEG de a voiture car dans in cette this ville city rend make la the vie life difficile. difficult 'Not having a car in this city makes life difficult.'
	- b. N' NEG être be pas NEG triste sad est is une a condition condition pour for chanter singing des of chansons. songs 'Not being sad is a condition for singing songs.'

The negator *pas* can either follow or precede an infinitive auxiliary verb, although the acceptability of the ordering in (7b) and (8b) is restricted to certain conservative varieties of French.

In capturing the distributional behavior of such adverbial negatives in English and French, as noted earlier, the derivational view (exemplified by Pollock 1989 and Chomsky 1991) has relied on the notion of verb movement and functional projections. The most appealing aspect of this view (initially at least) is that it can provide an analysis of the systematic variation between English and French. By simply assuming that the two languages have different scopes of verb movement – in English only auxiliary verbs move to a higher functional projection, whereas all French verbs undergo this process – the derivational view could explain why the French negator *pas* follows a finite verb, unlike the English negator *not*. In order for this system to succeed, nontrivial complications are required in the basic components of the grammar, e.g., rather questionable subtheories (see Kim 2000: Chapter 3 and Kim & Sag 2002 for detailed discussion).

Meanwhile, the non-derivational, lexicalist analyses of HPSG license all surface structures by the system of phrase types and constraints. That is, the po-

### Jong-Bok Kim

sition of adverbial negatives is taken to be determined not by the respective properties of verb movement, but by their lexical properties, the morphosyntactic (finiteness) features of the verbal head, and independently motivated Linear Precedence (LP) constraints, as we will see in the following discussion.

# **2.2 Constituent negation**

When English *not* negates an embedded constituent, it behaves much like the negative adverb *never*. The similarity between *not* and *never* is particularly clear in non-finite verbal constructions (participle, infinitival, and bare verb phrases), as illustrated in (9) and (10) (see Klima 1964; Kim 2000, Kim & Michaelis 2020: 199):

	- b. We asked him [never [to try to read the book]].
	- c. Duty made them [never [miss the weekly meeting]].
	- b. We asked him [not [to try to read the book]].
	- c. Duty made them [not [miss the weekly meeting]].

French *ne-pas* is no different in this regard. *Ne-pas* and certain other adverbs precede an infinitival VP:

	- b. [Régulièrement regularly [repeindre to.paint sa one's maison]] house est is une a nécessité. necessity 'Regularly painting one's house is a necessity.'

To capture such distributional possibilities, Kim (2000) and Kim & Sag (2002) regard *not* and *ne-pas* as adverbs that modify non-finite VPs, not as heads of their own functional projection as in the derivational view. The analyses view the lexical entries for *ne-pas* and *not* to include at least the information shown in (12).<sup>6</sup>

<sup>6</sup>Here I assume that both languages distinguish *fin*(*ite*) and *nonfin*(*ite*) verb forms, but that certain differences exist regarding lower levels of organization. For example, *prp* (*present participle*) is a subtype of *fin* in French, whereas it is a subtype of *nonfin* in English.

18 Negation

The lexical information in (12) specifies that *not* and *ne-pas* modify a non-finite VP and that this modified VP serves as the semantic argument of the negation. This simple lexical specification correctly describes the distributional similarities between English *not* and French *ne-pas*, as seen from the structure in Figure 1.

Figure 1: Structure of constituent negation

The lexical specification as premodifier (PRE-MODIFIER+) together with an LP rule requiring such adjuncts to precede the head they modify (Müller 2021: 375, Chapter 10 of this volume) ensures that both *ne-pas* and *not* precede the VPs that they modify. Since the negator modifies a VP it follows that the negator does not separate an infinitival verb from its complements, as observed from the following data (Kim & Sag 2002: 356):

	- b. \* [Speaking not English] is a disadvantage.
	- b. \* [Ne NEG parler to.speak pas NEG français] French est is un a grand great désavantage disadvantage en in ce this cas. case

### Jong-Bok Kim

Interacting with the LP constraints, the lexical specification in (12) ensures that the constituent negation precedes the VP it modifies. This predicts the grammaticality of (13a) and (14a), where *ne-pas* and *not* are used as VP[*nonfin*] modifiers. (13b) and (14b) are ungrammatical, since the modifier fails to appear in the required position – i.e., before all elements of the non-finite VP.

The HPSG analyses sketched here have recognized the fact that finiteness plays a crucial role in determining the distributional possibilities of negative adverbs. Its main explanatory capacity has basically come from the proper lexical specification of these negative adverbs. The lexical specification that *pas* and *not* both modify non-finite VPs has sufficed to predict their occurrences in non-finite environments.

# **2.3 Sentential negation**

With respect to negation in finite clauses, there are important differences between English and French. As I have noted earlier, it is a general fact of French that *pas* must follow a finite verb, in which case the verb optionally bears negative morphology (*ne*-marking) (Kim & Sag 2002: 361):

	- b. \* Dominique Dominique pas NEG aime like Alex. Alex

In English, *not* must follow a finite auxiliary verb, not a lexical (or main) verb:

	- b. \* Dominique not does like Alex.
	- c. \* Dominique likes not Alex.

In contrast to its distribution in non-finite clauses, the distribution of *not* in finite clauses concerns sentential negation. The need to distinguish between constituent and sentential negation can be observed from many grammatical environments, including scope possibilities that one can observe in an example like (17) (see Klima 1964; Baker 1991; Warner 2000; Kim & Michaelis 2020: 200).<sup>7</sup>

<sup>7</sup>Warner (2000) and Bender & Lascarides (2013) discuss scopal interactions of negation with auxiliaries (modals) and quantifiers within the system of Minimal Recursion Semantics (MRS). On MRS see also Koenig & Richter (2021: Section 6.1), Chapter 22 of this volume.

18 Negation

(17) The president could not approve the bill.

Negation here could have the two different scope readings paraphrased in the following:

(18) a. It would be possible for the president not to approve the bill.

b. It would not be possible for the president to approve the bill.

The first interpretation is constituent negation; the second is sentential negation.

The need for this distinction also comes from distributional possibilities. The adverb *never* is a true diagnostic of a VP modifier, and I use these observed contrasts between *never* and *not* to reason about what the properties of the negator *not* must be. As noted, the sentential negation cannot modify a finite VP, and is thus different from the adverb *never*:

	- b. Lee will never/not leave.

The contrast in these two sentences shows one clear difference between *never* and *not*: the negator *not* cannot precede a finite VP, though it can freely occur as a non-finite VP modifier, whereas *never* can appear in both positions.

Another key difference between *never* and *not* can be found in the VP ellipsis construction. Observe the following contrast (see Warner 2000 and Kim & Sag 2002):<sup>8</sup>

	- b. \* Mary sang a song, but Lee could never \_.
	- c. Mary sang a song, but Lee could not \_.

The data here indicate that *not* can appear after the VP ellipsis auxiliary, but this is not possible with *never*.

We saw the lexical representation for constituent negation *not* in (12) above. Unlike the constituent negator, the sentential negator *not* typically follows a finite auxiliary verb. *Too*, *so*, and *indeed* also behave like this:

	- b. Kim will too/so/indeed read it.

<sup>8</sup>As seen from an attested example like *I, being the size I am, could hide as one of them, whereas she could never*, in a limited context the adverb *never* is stranded after a modal auxiliary, but not after a non-modal auxiliary verb like *be, have* and *do*. Such a stranding seems to be possible when the adverb expresses a contrastive focus meaning.

### Jong-Bok Kim

These expressions are used to reaffirm the truth of the sentence in question and follow a finite auxiliary verb. This suggests that *too*, *so*, *indeed* and the sentential *not* belong to a special class of adverbs (which I call Adv<sup>I</sup> ) that combine with a preceding auxiliary verb (see Kim 2000: 94–95).

Noting the properties of *not* that were discussed so far, the HPSG analyses of Abeillé & Godard (1997), Kim (2000: Section 3.4), and Warner (2000) have taken this group of adverbs (Adv<sup>I</sup> ) including the sentential negation *not* to function as the complement of a finite auxiliary verb via the following lexical rule:<sup>9</sup>

(22) Adverb-Complement Lexical Rule: *fin-aux* SYNSEM|LOC|CAT HEAD AUX + VFORM *fin* COMPS 1 ↦→ *adv-comp-fin-aux* SYNSEM|LOC|CAT|COMPS ADV<sup>I</sup> ⊕ 1

This lexical rule specifies that when the input is a finite auxiliary verb, the output is a finite auxiliary (*fin-aux* ↦→ *adv-comp-fin-aux*) that selects Adv<sup>I</sup> (including the sentential negator) as an additional complement.<sup>10</sup> This would then license a structure like in Figure 2.

As shown in Figure 2, the finite auxiliary verb *could* combines with two complements, the negator *not* (Adv<sup>I</sup> ) and the VP *approve the bill*. This combination results in a well-formed head-complement phrase. By treating *not* as both a modifier (constituent negation) and a lexical complement of a finite auxiliary (sentential negation), it is thus possible to account for the scope differences in (17) with the following two possible structures:

	- b. The president [could] [not] [approve the bill].

In (23a), *not* functions as a modifier to the base VP, while in (23b), whose partial structure is given in Figure 2, it is a sentential negation serving as the complement of *could*.

<sup>9</sup>The symbol ⊕ stands for the relation *append*, i.e., a relation that concatenates two lists. The rule adds the adverb to the COMPS list. More recent variants use the ARG-ST list for valence representations. The rule can be adapted to the ARG-ST format, but for the sake of readability, I stay with the COMPS-based analysis.

<sup>10</sup>As discussed in the following, this type of lexical rule allows us to represent a key difference between English and French, namely that French has no restriction on the feature AUX to introduce the negative adverb *pas* as a finite verb's complement.

18 Negation

Figure 2: Structure of sentential negation

The present analysis allows us to have a simple account for other related phenomena, including the VP ellipsis discussed in (20). The key point was that, unlike *never*, the sentential negation can host a VP ellipsis. The VP ellipsis after *not* is possible, given that any VP complement of an auxiliary verb can be unexpressed, as specified by the following lexical rule (see Kim 2000: 99 and Kim & Michaelis 2020: 209 for similar proposals):

(24) Predicate ellipsis lexical rule: *adv-comp-fin-aux* ARG-ST 1 XP, 2 ADV<sup>I</sup> , YP ↦→ *aux-ellipsis-wd* ARG-ST 1, 2 , YP[*pro*] 

What the rule in (24) tells us is that an auxiliary verb selecting two arguments can be projected into an elided auxiliary verb (*aux-ellipsis-wd*) whose third argument is realized as a small *pro*, which by definition behaves like a slashed expression in not mapping into the syntactic grammatical function COMPS (see Abeillé & Borsley (2021: Section 4.1), Chapter 1 of this volume and Davis, Koenig & Wechsler (2021: Section 3), Chapter 9 of this volume for mappings from ARG-ST to COMPS). The YP without structure sharing is a shorthand for carrying over all information from the input of the lexical rule to the output with the exception of the type of the YP-AVM. The type at the input is *canonical* and the type at the output is *pro*. This analysis would then license the structure in Figure 3.

Jong-Bok Kim

Figure 3: A licensed VP ellipsis structure

As represented in Figure 3, the auxiliary verb *could* forms a well-formed headcomplement phrase with *not*, while its VP[*bse*] is unrealized (see Kim 2000; Kim & Sells 2008 for detail). The sentential negator *not* can "survive" VP ellipsis because it can be licensed in the syntax as the complement of an auxiliary, independent of the following VP. However, an adverb like *never* is only licensed as a modifier of VP. Thus if the VP were elided, we would have the hypothetical structure like the one in Figure 4. The adverb *never* modifies a VP through the feature

Figure 4: Ill-formed Head-Adjunct structure

MOD, which guarantees that the adverb requires the head VP that it modifies. In an ellipsis structure, the absence of such a VP means that there is no VP for the adverb to modify. In other words, there is no rule licensing such a combination – predicting the ungrammaticality of \**has never*, as opposed to *has not*.

The HPSG analysis just sketched here can be easily extended to French negation, whose data is repeated here.

18 Negation

(French)

	- b. Robin Robin (n') NEG aime likes pas NEG Stacy. Stacy 'Robin does not like Stacy.'

Unlike the English negator *not*, *pas* must follow a finite verb. Such a distributional contrast has motivated verb movement analyses, as mentioned above (see Pollock 1989; Zanuttini 2001). By contrast, the present HPSG analysis is cast in terms of a lexical rule that maps a finite verb into a verb with a certain adverb like *pas* as an additional complement. The idea of converting modifiers in French into complements has been independently proposed by Miller (1992) and Abeillé & Godard (1997) for French adverbs including *pas*. Building upon this previous work, Kim (2000) and Abeillé & Godard (2002) allow the adverb *pas* to function as a syntactic complement of a finite verb in French.<sup>11</sup> This output verb *neg-fin-v* then allows the negator *pas* to function as the complement of the verb *n'aime*, as represented in Figure 5.

The analysis also explains the position of *pas* in finite clauses. The placement of *pas* before a finite verb in (25a) is unacceptable, since *pas* here is used not as a non-finite VP modifier, but as a finite VP modifier. But in the present analysis which allows *pas*-type negative adverbs to serve as the complement of a finite verb, *pas* in (25b) can be the sister of the finite verb *n'aime*.

Given that the imperative, subjunctive, and even present participle verb forms in French are finite, we can expect that *pas* cannot precede any of these verb forms, which the following examples confirm (Kim 2000: 142):

	- b. \* Ne NEG pas NEG mange eat ta your soupe. soup

<sup>11</sup>Following Abeillé & Godard (2002), one could assume *ne* to be an inflectional affix which can be optionally realized in the output of the lexical rule in Modern French.

Jong-Bok Kim

Figure 5: Partial structure of (25b)

	- b. \* Il it est is important important que that vous you ne NEG pas NEG répondiez. answer
	- b. \* Ne NEG pas NEG parlant speaking Français, French Stacy Stacy avait had des of difficultés. difficulties

Note that this non-derivational analysis reduces the differences between French and English negation to a matter of lexical properties. The negators *not* and *pas* are identical in that they both are VP[*nonfin*]-modifying adverbs. But they are different with respect to which verbs can select them as complements: *not* can be the complement of a finite auxiliary verb, whereas *pas* can be the complement of any finite verb. So the only difference between *not* and *pas* is the morphosyntactic value [AUX +] of the verb they combine with, and this induces the difference in the positions of the negators in English and French.

18 Negation

# **3 Morphological negative**

As noted earlier, languages like Turkish and Japanese employ morphological negation where the negative marker behaves like a suffix (Kelepir 2001: 171 for Turkish and Kato 1997; 2000 for Japanese). Consider a Turkish and a Japanese example respectively:


As shown by the examples, the sentential negation of Turkish and Japanese employ morphological suffixes *-me* and *-na*, respectively. It is possible to state the ordering of these morphological negative markers in configurational terms by assigning an independent syntactic status to them. But it is too strong a claim to take the negative suffix *-me* or *-na* to be an independent syntactic element, and to attribute its positional possibilities to syntactic constraints such as verb movement and other configurational notions. In these languages, the negative affix acts just like other verbal inflections in numerous respects. The morphological status of these negative markers is supported by their participation in morphophonemic alternations. For example, the vowel of the Turkish negative suffix *-me* shifts from open to closed when followed by the future suffix, as in *gel-mi-yecke* 'come-NEG-FUT'. Their strictly fixed position also indicates their morphological constituenthood. Though these languages allow a rather free permutation of syntactic elements (scrambling), there exist strict ordering restrictions among verbal suffixes including the negative suffix, as observed in the following:


The strict ordering of the negative affix here is a matter of morphology. If it were a syntactic concern, then the question would arise as to why there is an obvious

### Jong-Bok Kim

contrast in the ordering principles of morphological and syntactic constituents, i.e., why the ordering rules of morphology are distinct from the ordering rules of syntax. The simplest explanation for this contrast is to accept the view that morphological constituents including the negative marker are formed in the lexical component and hence have no syntactic status (see Kim 2000: Chapter 2 for detailed discussion).

Given these observations, it is more reasonable to assume that the placement of a negative affix is regulated by morphological principles, i.e., by the properties of the morphological negative affix itself. The process of adding a negative morpheme to a lexeme can be modeled straightforwardly by the following lexical rule (for a similar treatment see Kim 2000: 36; Crowgey 2012: 111–112):

(32) Negative word formation lexical rule:

As shown here, any verb lexeme can be turned into a verb with the negative morpheme attached. That is, the language-particular definition for F*neg* will ensure that an appropriate negative morpheme is attached to the lexeme. For instance, the suffix -*ma* for Turkish and -*na* for Japanese will be attached to the verb lexeme, generating the verb forms in (30a).<sup>12</sup> See Crysmann (2021), Chapter 21 of this volume for details on how the realization of inflectional features is modeled in HPSG.

This morphological analysis can be extended to the negation of languages like Libyan Arabic, as discussed in Borsley & Krer (2012). The language has a bipartite realization of negation, the proclitic *ma-* and the enclitic *-s̆*:


Following Borsley & Krer (2012: 10), one can treat these clitics as affixes and generate a negative word. Given that the function f*neg* in Libyan Arabic allows the attachment of the negative prefix *ma-* and the suffix -*s̆*to the verb stem *ms̆uu*, we would have the following output in accordance with the lexical rule in (32):<sup>13</sup>

<sup>12</sup>In a similar manner, Przepiórkowski & Kupść (1999) and Przepiórkowski (2000; 2001) discuss aspects of Polish negation, which is realized as the prefix *nie* to a verbal expression.

<sup>13</sup>Borsley & Krer (2012) note that the suffix -*s̆*is not realized when a negative clause includes an n-word or an NPI (negative polarity item). See Borsley & Krer (2012) for further details.

18 Negation

$$\begin{array}{c} \begin{array}{|c|c|} \hline \text{neg-}\nu\text{-}l\text{\times}m \\ \text{PHON } \langle \text{ma-mšuu-\` s} \rangle \\\\ \text{SYNSEM}|\text{LOC} \begin{bmatrix} \text{CAT}|\text{HEAD } neg \\ \text{CONT } neg\text{-}rel \end{bmatrix} \end{array} \end{array} \Bigg| \begin{array}{c} \begin{bmatrix} \text{CAT}|\text{HEAD } neg \\ \text{CONT } neg\text{-}rel \end{bmatrix} \end{array} \right]$$

The lexicalist HPSG analyses sketched here have been built upon the thesis that autonomous (i.e., non-syntactic) principles govern the distribution of morphological elements (Bresnan & Mchombo 1995). The position of the morphological negation is simply defined in relation to the verb stem it attaches to. There are no syntactic operations such as head-movement or multiple functional projections in forming a verb with the negative marker.

# **4 Negative auxiliary verb**

Another way of expressing sentential negation, as noted earlier, is to employ a negative auxiliary verb. Some head-final languages like Korean and Hindi employ negative auxiliary verbs. Consider a Korean example:


The negative auxiliary in head-final languages like Korean typically appears clause-finally, following the invariant form of the lexical verb. In head-initial SVO languages, however, the negative auxiliary almost invariably occurs immediately before the lexical verb (Payne 1985: 212). Finnish also exhibits this property (Mitchell 1991: 376):

(36) Minä I.NOM e-n NEG-1SG puhu-isi. speak-COND 'I would not speak.'

These negative auxiliaries have syntactic status: they can be inflected, above all. Like other verbs, they can also be marked with verbal inflections such as agreement, tense, and mood.

In dealing with negative auxiliary constructions, most of the derivational approaches have followed Pollock's and Chomsky's analyses in factoring out grammatical information (such as tense, agreement, and mood) carried by lexical items into various different phrase-structure nodes (see, among others, Hagstrom 2002, Han et al. 2007 for Korean, and Vasishth 2000 for Hindi). This derivational view has been appealing in that the configurational structure for English-type languages could be applied even for languages with different types of negation.

(Finish)

### Jong-Bok Kim

However, issues arise about how to address the grammatical properties of negative auxiliaries, which are quite different from the other negative forms.

The Korean negative auxiliary displays all the key properties of auxiliary verbs in the language. For instance, both the canonical auxiliary verbs and the negative auxiliary alike require the preceding lexical verb to be marked with a specific verb form (VFORM), as illustrated in the following:


The auxiliary verb *siph-* in (37a) requires a *-ko*-marked lexical verb, while the negative auxiliary verb *anh-* in (37b) asks for a *-ci*-marked lexical verb. This shows that the negative is also an auxiliary verb in the language.

In terms of syntactic structure, there are two possible analyses. One is to assume that the negative auxiliary takes a VP complement and the other is to claim that it forms a verb complex with an immediately preceding lexical verb, as represented in Figures 6a and 6b, respectively (Chung 1998; Kim 2016).

The distributional properties of the negative auxiliary in the language support a complex predicate structure (cf. Figure 6b) in which the negative auxiliary verb forms a syntactic/semantic unit with the preceding lexical verb. For instance, no

18 Negation

adverbial expression, including a parenthetical adverb, can intervene between the main and the auxiliary verb, as illustrated by the following:


Further, in an elliptical construction, the elements of a verb complex always occur together. Neither the lexical verb (39c) nor the auxiliary verb alone (39d) can serve as a fragment answer to the corresponding polar question:



The lexical verb and the auxiliary must appear together as in (39b). These constituenthood properties indicate that the negative auxiliary forms a syntactic unit with a preceding lexical verb in Korean.

To address these complex verb properties, one could assume that an auxiliary verb forms a complex predicate, licensed by the following schema (see Kim 2016: 95):

(40) HEAD-LIGHT Schema:

$$
\begin{array}{|l|l|}
\hline
\text{head-light-phase} \\
\text{COMPS} & \boxed{\text{\tiny\tiny\tiny\}} \\
\text{LIGHT} & + \\
\text{HEAD-DTR [\overline{\text{\tiny\tiny}}]} \\
\text{DTRS} & \left\langle \begin{matrix} \boxed{\text{\tiny\tiny}} \end{matrix} \begin{matrix} \text{LIGHT}+ \end{matrix} \right\rangle, \boxed{\text{\tiny\tiny\tiny}} \\
\left\langle \begin{matrix} \boxed{\text{\tiny}} \end{matrix} \begin{matrix} \text{LIGHT}+ \end{matrix} \begin{matrix} \boxed{\text{\tiny}} \end{matrix} \begin{matrix} \text{LGR} \end{matrix} \begin{matrix} \boxed{\text{\tiny}} \end{matrix} \right\rangle \\
\hline
\end{array}
\end{array}
$$

### Jong-Bok Kim

This construction schema means that a LIGHT head expression combines with a LIGHT complement, yielding a light, quasi-lexical constituent (Bonami & Webelhuth 2012). When this combination happens, there is a kind of argument composition: the COMPS value of this lexical complement is passed up to the resulting mother. The constructional constraint thus induces the effect of argument composition in syntax, as illustrated by Figure 7. The auxiliary verb *anh-ass-*

Figure 7: An example structure licensed by the HEAD-LIGHT Schema

*ta* 'NEG-PST-DECL' combines with the matrix verb *ilk-ci* 'read-CONN', creating a well-formed *head-light-phrase*. Note that the resulting construction inherits the COMPS value from that of the lexical complement *ilk-ci* 'read-CONN' in accordance with the structure-sharing imposed by the HEAD-LIGHT Schema in (40). That is, the HEAD-LIGHT Schema licenses the combination of an auxiliary verb with its lexical verb, while inheriting the lexical verb's complement value through argument composition. The present system thus allows argument composition at the syntax level, rather than in the lexicon.

The HPSG analysis I have outlined has taken the negative auxiliary in Korean to select a lexical verb, the resulting combination forming a verbal complex. The present analysis implies that there is no upper limit for the number of auxiliary verbs to occur in sequence, as long as each combination observes the morphosyntactic constraint on the preceding auxiliary expression. Consider the following:

18 Negation

(Korean)

(41) a. Sakwa-lul apple-ACC [mek-ci eat-CONN anh-ta]. NEG-DECL '(I/he/she) do/does not eat the apple.'


As seen from the bracketed structures, it is possible to add one more auxiliary verb to an existing HEAD-LIGHT phrase with the final auxiliary bearing an appropriate connective marker. There is no upper limit to the possible number of auxiliary verbs one can add (see Kim 2016: 88 for detailed discussion).

The present analysis in which the negative auxiliary forms a complex predicate structure with a lexical verb can also be applied to languages like Basque, as suggested by Crowgey & Bender (2011). They explore the interplay of sentential negation and word order in Basque. Consider their example (p. 51):


Unlike Korean, the negative auxiliary *ez-ditu* precedes the main verb. Other than this ordering difference, just like Korean, the two form a verb complex structure, as represented in Figure 8.

In the treatment of negative auxiliary verbs, HPSG analyses have taken the negative auxiliary to be an independent lexical verb whose grammatical (syntactic) information is not distributed over different phrase structure nodes, but rather is incorporated into its precise lexical specifications. In particular, the negative auxiliary forms in many languages a verb complex structure whose constituenthood is motivated by independent phenomena.

Jong-Bok Kim

Figure 8: Negation verb combination in Basque adapted from Crowgey & Bender (2011: 51)

# **5 Preverbal negative**

The final type of sentence negation is preverbal negatives, which we can observe in languages like Italian and Welsh:

	- b. Dw am i I ddim NEG wedi PRF gweld see neb. nobody (Welsh, Borsley & Jones 2005: 108) 'I haven't seen anybody.'

As seen here, the Italian preverbal negative *non* – also called negative particle or clitic – always precedes a lexical verb, whether finite or non-finite, as further attested by the following examples (Kim 2000: Chapter 4):

(44) a. Gianni Gianni vuole wants che that io I non NEG legga read articoli articles di of sintassi. syntax (Italian) 'Gianni hopes that I do not read syntax articles.'

18 Negation


The derivational view again attributes the distribution of such a preverbal negative to the reflex of verb movement and functional projections (see Belletti 1990: Chapter 1). This line of analysis also appears to be persuasive in that the different scope of verb movement application could explain the observed variations among typologically related languages. Such an analysis, however, fails to capture unique properties of the preverbal negative in contrast to the morphological negative, the negative auxiliary, and the adverbial negative.

Kim (2000) offers an HPSG analysis of Italian and Spanish negation. His analysis takes *non* to be an independent lexical head, even though it is a clitic. This claim follows the analyses sketched by Monachesi (1993) and Monachesi (1998), which assume that there are two types of clitics: affix-like clitics and word-like clitics. Pronominal clitics belong to the former, whereas the clitic *loro* 'to them' belongs to the latter. Kim's analysis suggests that *non* also belongs to the latter group.<sup>14</sup> Treating *non* as a word-like element, as in the following, will allow us to capture its word-like properties, such as the possibility of it bearing stress and its separation from the first verbal element. However, it is not a phrasal modifier, but an independent particle (or clitic) which combines with the following lexical verb (see Kim 2000 for detailed discussion).

> 

(45) Lexical specifications for *non* in Italian:

$$\begin{bmatrix} \text{PHON } \langle \text{ non} \rangle \\\\ \text{SYNSEM} \text{[LOC} \\\\ \text{SYNEM} \text{[LOC} \\\\ \text{CONT} \begin{bmatrix} \text{COMPS} \\\\ \text{COMPS} \begin{bmatrix} \text{HEAD} & \begin{bmatrix} \text{I} \\\\ \text{CONT} & \begin{bmatrix} \text{E} \end{bmatrix} \end{bmatrix} \end{bmatrix} \oplus \begin{bmatrix} \text{I} \\\\ \text{ONT} \end{bmatrix} \end{bmatrix} \end{bmatrix}$$

This lexical entry roughly corresponds to the entry for Italian auxiliary verbs (and restructuring verbs with clitic climbing), in that the negator *non* selects a

<sup>14</sup>One main difference between *non* and *loro* is that *non* is a head, whereas *loro* is a complement XP. See Monachesi (1998) for further discussion of the behavior of *loro* and its treatment.

### Jong-Bok Kim

verbal complement and, further, that verb's complement list. One key property of *non* is its HEAD value: this value is in a sense undetermined, but structure-shared with the HEAD value of its verbal complement. The value is thus determined by what it combines with. When *non* combines with a finite verb, it will be a finite verb, and when it combines with an infinitival verb, it will be a non-finite verb.

In order to see how this system works, let us consider an Italian example where the negator combines with a transitive verb as in (1d), repeated here as (46):

(46) Gianni Gianni non NEG legge reads articoli articles di of sintassi. syntax (Italian) 'Gianni doesn't read syntax articles.'

When the negator *non* combines with the finite verb *legge* 'reads' that selects an NP object, the resulting combination will form the verb complex structure given in Figure 9.

 

Figure 9: Verb complex structure of (46)

Borsley (2006), adopting Kathol's (2000) topological approach, provides a linearization-based HPSG approach to capturing the distributional possibilities of negation in Italian and Welsh, which we have seen in (43a) and (43b), respectively. Different from Borsley & Jones's (2005) selectional approach where a negative

### 18 Negation

expression selects its own complement, Borsley's linearization-based approach allows the negative expression to have a specified topological field. For instance, Borsley (2006: 79), accepting the analysis of Kim (2000) where *non* is taken to be a type of clitic-auxiliary, posits the following order domain:

$$\begin{array}{c} \text{(47)} \quad \left[ \begin{array}{c} \text{(first)}\\ \left( \begin{array}{|c|} \text{(Gianni)} \end{array} \right), \left[ \begin{array}{c} \text{second} \\ \text{(NEG+)} \end{array} \right], \left[ \begin{array}{c} \text{third} \\ \left( \begin{array}{c} \text{telepona} \end{array} \right) \end{array} \right], \left[ \begin{array}{c} \text{third} \\ \text{(NEG+)} \end{array} \right] \right] \end{array}$$

With this ordering domain, Borsley (2006) postulates that the Italian sentential negator *non* bearing the positive NEG feature is in the second field.<sup>15</sup> The analysis then can attribute the distributional differences between Italian and Welsh negators by referring to the difference in their domain value. That is, in Borsley's analysis, the Welsh NEG expression *ddim*, unlike Italian *non*, is required to be in the third field, as illustrated in the following domain for the sentence (43b) (from Borsley 2006: 76):<sup>16</sup>

$$\text{(48)}\quad \left[ \text{DOM} \left( \begin{bmatrix} \textit{second} \\ \langle \textit{ } d\boldsymbol{\omega} \rangle \end{bmatrix}, \begin{bmatrix} \textit{third} \\ \langle \textit{ } i \rangle \end{bmatrix}, \begin{bmatrix} \textit{third} \\ \textit{NEG}+ \\ \langle \textit{ } d\textit{dim} \rangle \end{bmatrix}, \begin{bmatrix} \textit{third} \\ \langle \textit{ } \textit{weld } \textit{gweld } \textit{neb} \rangle \end{bmatrix} \right) \right]$$

As such, with the assumption that constituents have an order domain to which ordering constraints apply, the topological approach enables us to capture the complex distributional behavior of the negators in Italian and Welsh.

# **6 Other related phenomena**

In addition to this work focusing on the distributional possibilities of negation, there has also been HPSG work on genitive of negation and negative concord.

Przepiórkowski (2000) offers an HPSG analysis for the non-local genitive of negation in Polish. In Polish, negation is realized as the prefix *nie* to a verbal expression (see Przepiórkowski & Kupść 1999; Przepiórkowski 2000; 2001), and Polish allows the object argument to be genitive-marked when the negative marker is present, as in (49b). The assignment of genitive case to the object need not be local as shown in (50b) (data from Przepiórkowski 2000: 120):

<sup>15</sup>Borsley (2006) also notes that Italian negative expressions like *nessuno* 'nobody' also bear the feature NEG but are required to be in the third field.

<sup>16</sup>Different from Borsley (2006), Borsley & Jones (2000) offer a selectional analysis of Welsh negation. That is, the finite negative verb selects two complements (e.g., subject and object) while the non-finite negative verb selects a VP. See Borsley & Jones (2000) for details.

Jong-Bok Kim

$$\begin{array}{l} \text{(49)} \quad \text{a. Lubię} \quad \text{Marię} \\ \text{like.1sc} \text{ Marry.acc} \\ \text{'I like Mary.'} \\ \text{b. Nie lubię} \quad \text{Marií} \quad \text{/\* Marię} \\ \text{Nie lluie.1sc} \text{ Mary.cKm} \\ \text{'I don't like Mary.'} \\ \text{'I don't like Mary.'} \\ \text{(50)} \quad \text{a. Janek wydawal się lubi�'} \quad \text{Marię.} \\ \text{John seemed } \text{\*n like.inv} \text{ Mary.acc} \\ \text{'John seemed to like Mary.'} \\ \text{b. Janek nie wydawal się lubi�'} \quad \text{Marií} \quad \text{/ / Marię.} \\ \text{John nie wydawal się lubi�'} \quad \text{Marií} \quad \text{/ / Marię.} \\ \text{'John did not seem to like Mary.'} \end{array}$$

To account for this kind of phenomenon, Przepiórkowski (2000) suggests that the combination of the negative morpheme *nie* with a verb stem introduces the feature NEG. With this lexical specification, his analysis introduces the following principle (adapted from Przepiórkowski 2000: 143):

(51) Part of the Case Principle for Polish:

 HEAD *verb* NEG + ARG-ST 1 *nelist* ⊕ [CASE *str*] ⊕ 2 ⇒ - ARG-ST 1 ⊕ [CASE *sgen*] ⊕ 2 

The principle allows a NEG+ verbal expression to assign the CASE value *gen* to all non-initial arguments. This is why the negative word *nie* triggers the object complement of (49a) to be GEN-marked. As for the non-local genitive in (50a), Przepiórkowski (2000: 145) allows the verbal complement of a raising verb like *seem* to optionally undergo lexical argument composition. This process yields the following output for the matrix verb in (50b):

(52) Representation for *nie wydawał siȩ* 'did not seem' when combined with *lubić* 'like': PHON *nie wydawał siȩ* HEAD *verb* 

 NEG + ARG-ST NP, V- COMPS 1 NP[*str*] ⊕ <sup>1</sup> 

This lexical specification allows the object NP of the embedded verb to be *gen*marked in accordance with the constraint in (51). In Przepiórkowski's analysis,

### 18 Negation

the feature NEG thus tightly interacts with the mechanism of argument composition and lexical construction-specific case assignment (or satisfaction).

Negation in languages like French, Italian, and Polish, among others, also involves negative concord. De Swart & Sag (2002) investigate negative concord in French, where multiple occurrences of negative constituents express either double negation or single negation:


The double negation reading in (53) has two quantifiers, while the single negation reading is an instance of negative concord, where the two quantifiers merge into one. De Swart & Sag (2002) assume that the information contributed by each quantifier is stored in QSTORE and retrieved at the lexical level in accordance with constraints on the verb's arguments and semantic content. For instance, the verb *n'aime* in (53) will have two different ways of retrieving the QSTORE value, as given in the following:<sup>17</sup>


In the AVM (54a), the two quantifiers are retrieved, inducing double negation (¬∃x¬∃y[love(x,y)]) while in (54b), the two have a resumptive interpretation in which the two are merged into one (¬∃x∃y[love(x,y)]).<sup>18</sup> This analysis, coupled with the complement treatment of *pas* as a lexically stored quantifier, can account for why *pas* does not induce a resumptive interpretation with a quantifier (from de Swart & Sag 2002: 376):

(55) Il he ne NEG va goes pas NEG nulle no part, where il he va goes à at son his travail. work (French) 'He does not go nowhere, he goes to work.'

<sup>17</sup>The QSTORE value contains information roughly equivalent to first order logic expressions like *NO*x[Person(x)]. See de Swart & Sag (2002).

<sup>18</sup>See de Swart & Sag (2002) for detailed formulation of the retrieval of stored value.

### Jong-Bok Kim

In this standard French example, de Swart & Sag (2002), accepting the analysis of Kim (2000) of *pas* as a complement, specify the meaning of the adverbial complement *pas* to be included as a negative quantifier in the qUANTS value. This means there would be no resumptive reading for standard French, inducing double negation as in (56):<sup>19</sup>

```
(56)

       PHON h ne va i
       ARG-ST 

                 ADVI[QSTORE { 1 }], NP[QSTORE { 2 }]
       qUANTS 

                 1 , 2
```
Przepiórkowski & Kupść (1999) and Borsley & Jones (2000) also investigate negative concord in Polish and Welsh and offer HPSG analyses. Consider a Welsh example from Borsley & Jones (2000: 17):

(57) Nid NEG oes is neb no.one yn in yr the ystafell room (Welsh) 'There is no one in the room.'

Borsley & Jones (2000), identifying n-words with the feature NC (negative concord), takes the verb *nid oes* 'NOT is' to bear the positive NEG value, and specifies the subject *neb* to carry the positive NC (negative concord) feature. This selectional approach, interacting with well-defined features, tries to capture how more than one negative element can correspond to a single semantic negation (see Borsley & Jones 2000 for detailed discussion).

# **7 Conclusion**

One of the most attractive consequences of the derivational perspective on negation has been that one uniform category, given other syntactic operations and constraints, explains the derivational properties of all types of negation in natural languages, and can further provide a surprisingly close and parallel structure among languages, whether typologically related or not. However, this line of thinking runs the risk of missing the particular properties of each type of negation. Each individual language has its own way of expressing negation, and moreover has its own restrictions in the surface realizations of negation which can hardly be reduced to one uniform category.

<sup>19</sup>See de Swart & Sag (2002), Richter & Sailer (2004), and Koenig & Richter (2021: Section 6.2.1), Chapter 22 of this volume for cases where *pas* induces negative concord.

In the non-derivational HPSG analyses for the four main types of sentential negation that I have reviewed in this chapter, there is no uniform syntactic element, though a certain universal aspect of negation does exist, viz. its semantic contribution. Languages appear to employ various possible ways of negating a clause or sentence. Negation can be realized as different morphological and syntactic categories. By admitting morphological and syntactic categories, it was possible to capture their idiosyncratic properties in a simple and natural manner. Furthermore, this theory has been built upon the Lexical Integrity Principle, the thesis that the principles that govern the composition of morphological constituents are fundamentally different from the principles that govern sentence structures. The obvious advantage of this perspective is that it can capture the distinct properties of morphological and syntactic negation, and also of their distribution, in a much more complete and satisfactory way.

# **Abbreviations**

3SGS 3rd singular subject 3PLO 3rd plural object CONN connective DEL delimiter HON honorific NPST nonpast RM reflexive marker

# **Acknowledgments**

I thank the reviewers of this chapter for detailed comments and suggestions, which helped improve the quality of this chapter a lot. I also thank Anne Abeillé, Bob Borsley, Jean-Pierre Koenig, and Stefan Müller for constructive comments on earlier versions of this chapter. My thanks also go to Okgi Kim, Rok Sim, and Jungsoo Kim for helpful feedback.

# **References**

Abeillé, Anne & Robert D. Borsley. 2021. Basic properties and elements. In Stefan Müller, Anne Abeillé, Robert D. Borsley & Jean- Pierre Koenig (eds.), *Head-Driven Phrase Structure Grammar: The handbook* (Empirically Oriented Theo-

### Jong-Bok Kim

retical Morphology and Syntax), 3–45. Berlin: Language Science Press. DOI: 10.5281/zenodo.5599818.


### 18 Negation


### Jong-Bok Kim

 Pierre Koenig (eds.), *Head-Driven Phrase Structure Grammar: The handbook* (Empirically Oriented Theoretical Morphology and Syntax), 315–367. Berlin: Language Science Press. DOI: 10.5281/zenodo.5599834.


18 Negation


### Jong-Bok Kim


### 18 Negation


# **Chapter 19**

# **Ellipsis**

Joanna Nykiel University of Oslo

# Jong-Bok Kim

Kyung Hee University, Seoul

This chapter provides an overview of the HPSG analyses of elliptical constructions. It first discusses three types of ellipsis (nonsentential utterances, predicate ellipsis, and non-constituent coordination) that have attracted much attention in HPSG. It then reviews existing evidence for and against the so-called direct interpretation or WYSIWYG (what you see is what you get) perspective to ellipsis, where no invisible material is posited at the ellipsis site. The chapter then recaps the key points of existing HPSG analyses applied to the three types of ellipsis.

# **1 Introduction**

Ellipsis is a phenomenon that involves a non-canonical mapping between syntax and semantics. What appears to be a syntactically incomplete utterance still receives a semantically complete representation, based on the features of the surrounding context, be it linguistic or nonlinguistic. The goal of syntactic theory is thus to account for how the complete semantics can be reconciled with the apparently incomplete syntax. One of the key questions here relates to the structure of the ellipsis site, that is, whether or not we should assume the presence of invisible syntactic material. Section 2 introduces three types of ellipsis (nonsentential utterances, predicate ellipsis, and non-constituent coordination) that have attracted considerable attention and received treatment within HPSG (our focus here is on standard HPSG rather than Sign-Based Construction Grammar; Sag 2012, see also Abeillé & Borsley 2021: Section 7.2, Chapter 1 of this volume

Joanna Nykiel & Jong-Bok Kim. 2021. Ellipsis. In Stefan Müller, Anne Abeillé, Robert D. Borsley & Jean- Pierre Koenig (eds.), *Head-Driven Phrase Structure Grammar: The handbook*, 847–888. Berlin: Language Science Press. DOI: 10. 5281/zenodo.5599856

#### Joanna Nykiel & Jong-Bok Kim

and Müller 2021b: Section 1.3.2, Chapter 32 of this volume on SBCG and Abeillé & Chaves 2021: Section 7, Chapter 16 of this volume on non-constituent coordination). In Section 3 we overview existing evidence for and against the so-called WYSIWYG (what you see is what you get) approach to ellipsis, where no invisible material is posited at the ellipsis site. Finally in Sections 4–6, we walk the reader through three types of HPSG analyses applied to the three types of ellipsis presented in Section 2. Our purpose is to highlight the nonuniformity of these analyses, along with the underlying intuition that ellipsis is not a uniform phenomenon. Throughout the chapter, we also draw the reader's attention to the key role that corpus and experimental data play in HPSG theorizing, which sets it apart from frameworks that primarily rely on intuitive judgments.

# **2 Three types of ellipsis**

Based on the type of analysis they receive in HPSG, elliptical phenomena can be broadly divided into three types: nonsentential utterances, predicate ellipsis, and non-constituent coordination.<sup>1</sup> We overview the key features of these types here before discussing in greater detail how they have been brought to bear on the question of whether there is invisible syntactic structure at the ellipsis site or not. We begin with stranded XPs, which HPSG treats as nonsentential utterances, and then move on to predicate and argument ellipsis, followed by phenomena known as non-constituent coordination.

# **2.1 Nonsentential utterances**

This section introduces utterances smaller than a sentence, which we refer to as *nonsentential utterances* (NSUs). These range from *Bare Argument Ellipsis* (BAE)<sup>2</sup> as in (1), through fragment answers as in (2) to direct or embedded fragment questions (sluicing) as in (3)–(4):


<sup>1</sup>For more detailed discussion, see Kim & Nykiel (2020).

<sup>2</sup>This term is used in Culicover & Jackendoff (2005: 6).

19 Ellipsis

(4) A: There's someone at the door. B: Who?/I wonder who.

As illustrated by these examples, sluicing involves stranded *wh*-phrases and has the function of an interrogative clause, while Bare Argument Ellipsis involves XPs representing various syntactic categories and typically has the function of a clause (Ginzburg & Sag 2000: 313, Culicover & Jackendoff 2005: 233).<sup>3</sup>

The key theoretical question nonsentential utterances raise is whether they are, on the one hand, parts of larger sentential structures or, on the other, nonsentential structures whose semantic and morphosyntactic features are licensed by the surrounding context. To adjudicate between these views, researchers have looked for evidence that nonsentential utterances in fact behave as if they were fragments of sentences. As we will see in Section 3, there is evidence to support both of these views. However, HPSG doesn't assume that nonsentential utterances are underlyingly sentential structures.

# **2.2 Predicate ellipsis and argument ellipsis**

This section looks at four constructions whose syntax includes null or unexpressed elements. These constructions are *Post-Auxiliary Ellipsis* (PAE),<sup>4</sup> which is a term we are using here for what is more typically referred to as *Verb Phrase Ellipsis* (VPE); pseudogapping; *Null Complement Anaphora* (NCA); and *argument drop* (or *pro*-drop). Post-Auxiliary Ellipsis features stranded auxiliary verbs as in (5), while pseudogapping, also introduced by an auxiliary verb, has a remnant right after the pseudo gap as in (6). Null Complement Anaphora is characterized by omission of complements to some lexical verbs as in (7), while argument drop refers to omission of a pronominal subject or an object argument, as illustrated in (8) for Polish.


<sup>3</sup>Several subtypes of nonsentential utterances (nonsentential utterances) can be distinguished, based on their contextual functions, an issue we leave open here (for a recent taxonomy, see Ginzburg 2012: 217).

<sup>4</sup>The term Post-Auxiliary Ellipsis was introduced by Sag (1976: 53) and covers cases where a non-VP element is elided after an auxiliary verb, as in *You think I am a superhero, but I am not.*

Joanna Nykiel & Jong-Bok Kim

(8) Pia Pia późno late wróciła get.PST.SG do to domu. home.GEN Od razu immediately poszła go.PST.SG spać. sleep.INF (Polish) 'Pia got home late. She went straight to bed.' (argument drop)

One key question raised by such constructions is whether these unrealized null elements should be assumed to be underlyingly present in the syntax of these constructions, and the answer is rather negative (see Section 3). Another question is whether theoretical analyses of constructions like Post-Auxiliary Ellipsis should be enriched with usage preferences, since these constructions compete with *do it/that/so* anaphora in predictable ways (see Miller 2013 for a proposal).

# **2.3 Non-constituent coordination**

We now focus on three instances of non-constituent coordination – gapping (Ross 1967: 171), Right Node Raising (RNR), and Argument Cluster Coordination (ACC) – illustrated in (9), (10), and (11), respectively.


In Right Node Raising, a single constituent located in the right-peripheral position is associated with both conjuncts. In both Argument Cluster Coordination and gapping, a finite verb is associated with both (or more) conjuncts but is only present in the leftmost one. Additionally in Argument Cluster Coordination, the subject of the first conjunct is also associated with the second conjunct but is only present in the former. These phenomena illustrate what appears to be coordination of standard constituents with elements not normally defined as constituents (a cluster of NPs in (9), a stranded transitive verb in (10), and a cluster of NP and PP in (11)).

To handle such constructions, the grammar must be permitted to (a) coordinate non-canonical constituents, (b) generate coordinated constituents parts of which are subject to an operation akin to deletion, or (c) coordinate VPs with nonsentential utterances. As we will see throughout this chapter, HPSG analyses of these constructions make use of all three options.

19 Ellipsis

# **3 Evidence for and against invisible material at the ellipsis site**

This section is concerned with nonsentential utterances and Post-Auxiliary Ellipsis, since this is where the contentious issues arise of whether there is invisible syntactic material in an ellipsis site (Sections 3.1 and 3.2) and of where ellipsis is licensed (Sections 3.3 and 3.4). Below, we consider evidence from the literature for and against invisible structure. As we will see, the evidence is based not only on intuitive judgments, but also on experimental and corpus data, the latter being more typical of the HPSG tradition.

# **3.1 Connectivity effects**

Connectivity effects refer to parallels between nonsentential utterances and their counterparts in sentential structures, thus speaking in favor of the existence of silent sentential structure. We focus on two kinds here: case-matching effects and preposition-stranding effects (for other examples of connectivity effects, see Ginzburg & Miller 2018). It's been known since Ross (1967) that nonsentential utterances exhibit case-matching effects, that is, they are typically marked for the same case that is marked on their counterparts in sentential structures. (12) illustrates this for German, where case matching is seen between a *wh*-phrase functioning as a nonsentential utterance and its counterpart in the antecedent (Merchant 2005b: 663):

(12) Er he will will jemandem someone.DAT schmeicheln, flatter aber but sie they wissen know nicht not wem who.DAT / \* wen. who.ACC

'He wants to flatter someone, but they don't know whom.'

Case-matching effects are crosslinguistically robust in that they are found in the vast majority of languages with overt case marking systems, and therefore, they have been taken as strong evidence for the reality of silent structure. The argument is that the pattern of case matching follows straightforwardly if a nonsentential utterance is embedded in silent syntactic material whose content includes the same lexical head that assigns case to the nonsentential utterance's counterpart in the antecedent clause (Merchant 2001; 2005a). However, a language like Hungarian poses a problem for this reasoning (Jacobson 2016). While Hungarian has verbs that assign one of two cases to their object NPs in overt

#### Joanna Nykiel & Jong-Bok Kim

clauses with no meaning difference, case matching is still required between a nonsentential utterance and its counterpart, whichever case is marked on the counterpart. To see this, consider (13) from Jacobson (2016: 356). The verb *hasonlit* 'resembles' assigns either sublative (SUBL) or allative (ALL) case to its object, but if the sublative is selected for a nonsentential utterance's counterpart, the nonsentential utterance must match this case.

(13) A: Ki-re who-SUBL hasonlit resemble.PRS.SG Péter? Péter 'Who does Péter resemble?' B: János-ra János-SUBL / ? János-hoz. János-ALL 'János.'

Jacobson (2016) notes that there is some speaker variation regarding the (un)acceptability of case mismatch here, while all speakers agree that either case is fine in a corresponding nonelliptical response to (13A). This last point is important, because it shows that the requirement of—or at least a preference for—matching case features applies to nonsentential utterances to a greater extent than it does to their nonelliptical equivalents, challenging connectivity effects.

Similarly problematic for case-based parallels between nonsentential utterances and their sentential counterparts are some Korean data. Korean nonsentential utterances can drop case markers more freely than their counterparts in nonelliptical clauses can, a point made in Morgan (1989) and Kim (2015). Observe the example in (14) from Morgan (1989: 237).

	- B: Yongsu-ka Yongsu-NOM / Yongsu Yongsu / \* Yongsu-lul. Yongsu-ACC 'Yongsu.'
	- B 0 : Yongsu-ka Yongsu-NOM / \* Yongsu Yongsu ku the chaek-ul book-ACC sa-ass-e. buy-PST-DECL 'Yongsu bought the book.'

When a nonsentential utterance corresponds to a nominative subject in the antecedent (as in (14B)), it can either be marked for nominative or be caseless. However, replacing the same nonsentential utterance with a full sentential answer,

### 19 Ellipsis

as in (14B 0 ), rules out case drop from the subject. This strongly suggests that the case-marked and caseless nonsentential utterances couldn't have identical source sentences if they were to derive via PF-deletion (deletion in the phonological component).<sup>5</sup> Data like these led Morgan (1989) to propose that not all nonsentential utterances have a sentential derivation, an idea later picked up in Barton (1998).

The same pattern is associated with semantic case. That is, in (15), if a nonsentential utterance is case-marked, it needs to be marked for comitative case like its counterpart in the A-sentence, but it may also simply be caseless. However, being caseless is not an option for the nonsentential utterance's counterpart in a sentential response to A (Kim 2015: 280).


The generalization for Korean is then that nonsentential utterances may be optionally realized as caseless, but may never be marked for a different case than is marked on their counterparts.

Overall, case-marking facts show that there is some morphosyntactic identity between nonsentential utterances and their antecedents, though not to the extent that nonsentential utterances have exactly the features that they would have if they were constituents embedded in sentential structures. The Hungarian facts also suggest that those aspects of the argument structure of the appropriate lexical heads present in the antecedent that relate to case licensing are relevant for an analysis of nonsentential utterances.<sup>6</sup>

The second kind of connectivity effects goes back to Merchant (2001; 2005a) and highlights apparent links between the features of nonsentential utterances and *wh*- and focus movement (leftward movement of a focus-bearing expression). The idea is that prepositions behave the same under *wh*- and focus movement as they do under clausal ellipsis, that is, they pied-pipe or strand in the same

<sup>5</sup>Nominative (in Korean) differs in this respect from three other structural cases in the language—dative, accusative, and genitive—in that these three may be dropped from nonelliptical clauses (see Morgan 1989; Lee 2016; Kim 2016). However, see Müller (2002) for a discussion of German dative and genitive as lexical cases.

<sup>6</sup>Hungarian and Korean are not the only problematic languages; for a list, see Vicente (2015).

#### Joanna Nykiel & Jong-Bok Kim

environments. If a language (e.g., English) permits preposition stranding under *wh*- and focus movement (*What did Harvey paint the wall with?* vs. *With what did Harvey paint the wall?*), then nonsentential utterances may surface with or without prepositions, as illustrated in (16) for sluicing and Bare Argument Ellipsis (see Section 4 for a theoretical analysis of this variation).

(16) A: I know what Harvey painted the wall with. B: (With) what?/(With) primer.

If there were indeed a link between preposition stranding and nonsentential utterances, then we would expect prepositionless nonsentential utterances to only be possible in languages with preposition stranding. This expectation is, however, disconfirmed by an ever-growing list of non-preposition stranding languages that do feature prepositionless nonsentential utterances: Brazilian Portuguese (Almeida & Yoshida 2007), Spanish and French (Rodrigues et al. 2009), Greek (Molimpakis 2019), Bahasa Indonesia (Fortin 2007), Russian (Philippova 2014), Polish (Szczegielniak 2008; Sag & Nykiel 2011; Nykiel 2013), Bulgarian (Abels 2017), Serbo-Croatian (Stjepanović 2008; 2012), Mauritian (Abeillé & Hassamal 2019), and Arabic (Leung 2014; Alshaalan & Abels 2020). A few of these studies have presented experimental evidence that prepositionless nonsentential utterances are acceptable, though for reasons still poorly understood, they typically do not reach the same level of acceptability as their variants with prepositions do (see Nykiel 2013; Nykiel & Kim 2021 for Polish, Molimpakis 2019 for Greek, and Alshaalan & Abels 2020 for Saudi Arabic). For more experimental and corpus work on connectivity effects, see Sag & Nykiel 2011 for Polish and Nykiel 2015; 2017; Nykiel & Hawkins 2020 for English.

It is evident from this research that there is no grammatical constraint on nonsentential utterances that keeps track of what preposition-stranding possibilities exist in any given language. On the other hand, it does not seem sufficient to assume that nonsentential utterances can freely drop prepositions, given examples of sprouting like (17), in which prepositions are not omissible (see Chung et al. 1995).<sup>7</sup> As noted by Chung et al. (1995: 250), the difference between the merger type of sluicing (16) and the sprouting type of sluicing (17) is that there is an explicit phrase that the nonsentential utterance corresponds to in the former but not in the latter (in the HPSG literature, this phrase is termed a Salient Utterance by Ginzburg & Sag 2000: 313 or a Focus-Establishing Constituent by Ginzburg 2012: 234).

<sup>7</sup>However, Hardt et al. (2020)'s corpus data yield examples of sprouting where prepositions are dropped from nonsentential utterances that serve as adjuncts rather then arguments, as in *A: Then you see where they're going to place it ]. B: What night?*

19 Ellipsis

(17) A: I know Harvey painted the wall. B: \*(With) what?/Yeah, \*(with) primer.

The challenge posed by (17) is how to ensure that the nonsentential utterance is a PP matching the implicit PP argument in the A-sentence (see the discussion around (35b) for further detail). This challenge has not received much attention in the HPSG literature, though see Kim (2015).

# **3.2 Island effects**

One of the predictions from the view that nonsentential utterances are underlyingly sentential is that they should respect island constraints on long-distance movement (see Chaves 2021, Chapter 15 of this volume for a discussion of islands in HPSG). But as illustrated below, nonsentential utterances (both sluicing and Bare Argument Ellipsis) exhibit island-violating behavior.<sup>8</sup> The nonsentential utterance in (18) would be illicitly extracted out of an adjunct (*\*Where does Harriet drink scotch that comes from?*) and the nonsentential utterance in (19) would be extracted out of a complex NP (*\*The Gay Rifle Club, the administration has issued a statement that it is willing to meet with*).<sup>9</sup>


Among Culicover & Jackendoff's (2005: 245) examples of well-formed islandviolating nonsentential utterances are also sprouted nonsentential utterances (those that correspond to implicit phrases in the antecedent) like (20)–(21).

(20) A: John met a woman who speaks French. B: With an English accent?

<sup>8</sup>As noted earlier, the derivational approaches need to move a remnant or nonsentential utterance to the sentence initial position and delete a clausal constituent since only constituents can be deleted. See Merchant (2001; 2010) for details.

<sup>9</sup>Merchant (2005a) argued that Bare Argument Ellipsis, unlike sluicing, does respect island constraints, an argument that was later challenged (see, e.g., Culicover & Jackendoff 2005: 239; Griffiths & Lipták 2014). However, Merchant (2005a) focused specifically on pairs of *wh*interrogatives and answers to them, running into the difficulty of testing for island-violating behavior, since a well-formed *wh*-interrogative antecedent could not be constructed.

#### Joanna Nykiel & Jong-Bok Kim

(21) A: For John to flirt at the party would be scandalous. B: Even with his wife?

Other scholars assume that sprouted nonsentential utterances are one of the two kinds of nonsentential utterances that respect island constraints, the other kind being contrastive nonsentential utterances, illustrated in (22) (Chung et al. 1995; Merchant 2005a; Griffiths & Lipták 2014).

(22) A: Does Abby speak the same Balkan language that Ben speaks? B: \* No, Charlie. (Merchant 2005a: 688)

Schmeh et al. (2015) further explore the acceptability of nonsentential utterances preceded by the response particle *no* like those in (22) compared to nonsentential utterances introduced by the response particle *yes*, depicted in (23). (22) and (23) differ in terms of discourse function in that the latter supplements the antecedent rather than correcting it, a discourse function signaled by the response particle *yes*.

(23) A: John met a guy who speaks a very unusual language. B: Yes, Albanian. (Culicover & Jackendoff 2005: 245)

Schmeh et al. (2015) find that corrections with *no* lead to lower acceptability ratings compared to supplementations with *yes* and propose that this follows from the fact that corrections induce greater processing difficulty than supplementations do, hence the acceptability difference between (22) and (23). This finding makes it plausible that the perceived degradation of island-violating nonsentential utterances could ultimately be attributed to nonsyntactic factors, e.g., the difficulty of successfully computing a meaning for them.

In contrast to nonsentential utterances, many instances of Post-Auxiliary Ellipsis appear to respect island constraints, as would be expected if there were unpronounced structure from which material was extracted. An example of a relative clause island is depicted in (24) (note that the corresponding sluicing nonsentential utterance is fine).

(24) \* They want to hire someone who speaks a Balkan language, but I don't remember which they do [want to hire someone who speaks ]. (Merchant 2001: 6)

(24) contrasts with well-formed island-violating examples like (25a) and (25b), as observed by Miller (2014) and Ginzburg & Miller (2018). 10

<sup>10</sup>Miller (2014) cites numerous corpus examples of island-violating pseudogapping.

### 19 Ellipsis

	- b. He was able to find a bakery where they make good baguette, but croissants, he couldn't [find a bakery where they make good ]. (Ginzburg & Miller 2018: 90)

As Ginzburg & Miller (2018) rightly point out, we do not yet have a complete understanding of when or why island effects show up in Post-Auxiliary Ellipsis. Its behavior is at best inconsistent, failing to provide convincing evidence for silent structure.

# **3.3 Structural mismatches**

Because structural mismatches are rare or absent from nonsentential utterances (see Merchant 2005a; 2013),<sup>11</sup> this section focuses on Post-Auxiliary Ellipsis and developments surrounding the question of which contexts license it. In a seminal study of anaphora, Hankamer & Sag (1976) classified Post-Auxiliary Ellipsis as a surface anaphor with syntactic features closely matching those of an antecedent present in the linguistic context. They argued in particular that Post-Auxiliary Ellipsis is not licensed if it mismatches its antecedent in voice. Compare the following two examples from Hankamer & Sag (1976: 327).

	- b. The children asked to be squirted with the hose, so they were.

(i) We're on to the semi-finals, though I don't know who against.

Further examples where nonsentential utterances refer to an NP or AP antecedent appear in COCA (Corpus of Contemporary American English):


<sup>11</sup>Given the assumption that canonical sprouting nonsentential utterances have VP antecedents, as in (17), Ginzburg & Miller (2018: 95) cite examples—originally from Beecher (2008: 13)—of sprouting nonsentential utterances with nominal, hence mismatched, antecedents, e.g., (i).

The nonsentential utterances in (ii)–(iii) repeat the lexical heads whose complements are being sprouted (*defense* and *fallen*), that is, they contain more material than is usual for nonsentential utterances (cf. (i)). It seems that without this additional material it would be difficult to integrate the nonsentential utterances into the propositions provided by the antecedents and hence to arrive at the intended interpretations.

#### Joanna Nykiel & Jong-Bok Kim

This proposal places tighter structural constraints on Post-Auxiliary Ellipsis than on other verbal anaphors (e.g., *do it/that*) in terms of identity between an ellipsis site and its antecedent. This has prompted extensive evaluation in a number of corpus and experimental studies in the subsequent decades. Below are examples of acceptable structural mismatches reported in the literature, ranging from voice mismatch in (27a) to nominal antecedents in (27b) and to split antecedents in (27c).<sup>12</sup>

	- b. Mubarak's survival is impossible to predict and, even if he does , his plan to make his son his heir apparent is now in serious jeopardy. (Miller & Hemforth 2014: 7)
	- c. Mary wants to go to Spain and Fred wants to go to Peru but because of limited resources only one of them can . (Webber 1979: 128)

There are two opposing views that have emerged from the empirical work regarding the acceptability and grammaticality of structural mismatches in Post-Auxiliary Ellipsis. The first view takes mismatches to be grammatical and connects degradation in acceptability to violation of certain independent constraints on discourse (Kehler 2002; Miller 2011; 2014; Miller & Hemforth 2014; Miller & Pullum 2014) or processing (Kim et al. 2011). Two types of Post-Auxiliary Ellipsis have been identified on this view through extensive corpus work—auxiliary choice Post-Auxiliary Ellipsis and subject choice Post-Auxiliary Ellipsis—each with different discourse requirements with respect to the antecedent (Miller 2011; Miller & Hemforth 2014; Miller & Pullum 2014). The second view assumes that there is a grammatical ban on structural mismatch, but violations may be repaired under certain conditions; repairs are associated with differential processing costs compared to matching ellipses and antecedents (Arregui et al. 2006; Grant et al. 2012). If we follow the first view, it is perhaps unexpected that voice mismatch should consistently incur a greater acceptability penalty under Post-Auxiliary Ellipsis than when no ellipsis is involved, as recently reported in

<sup>12</sup>Miller (2014: 87) also reports cases of structural mismatch with English comparative pseudogapping, as in (i) from COCA:

<sup>(</sup>i) These savory waffles are ideal for brunch, served with a salad as you would a quiche. (SanFranChron, 2012).

See also Abeillé et al. (2016) for examples of voice mismatch in French Right Node Raising.

### 19 Ellipsis

Kim & Runner (2018). <sup>13</sup> Kim & Runner (2018) stop short of drawing firm conclusions regarding the grammaticality of structural mismatches, but one possibility is that the observed mismatch effects reflect a construction-specific constraint on Post-Auxiliary Ellipsis. HPSG analyses take structurally mismatched instances of Post-Auxiliary Ellipsis to be unproblematic and fully grammatical, while also recognizing construction-specific constraints: discourse or processing constraints formulated for Post-Auxiliary Ellipsis may or may not extend to other elliptical constructions, such as nonsentential utterances (see Abeillé et al. 2016; Ginzburg & Miller 2018 for this point).

# **3.4 Nonlinguistic antecedents**

Like structural mismatches, the availability of nonlinguistic (situational) antecedents for an ellipsis points to the fact that it need not be interpreted by reference to and licensed by a structurally identical antecedent. Although this option is somewhat limited, Post-Auxiliary Ellipsis does tolerate nonlinguistic antecedents, as shown in (28) (see also Hankamer & Sag 1976; Schachter 1977).

	- b. Once in my room, I took the pills out. "Should I?" I asked myself. (Miller & Pullum 2014: ex. 22a)

Miller & Pullum (2014) note that such examples are exophoric Post-Auxiliary Ellipsis involving no linguistic antecedent for the ellipsis but just a situation where the speaker articulates their opinion about the action involved. Miller & Pullum (2014) provide an extensive critique of the earlier work on the ability of Post-Auxiliary Ellipsis to take nonlinguistic antecedents, arguing for a streamlined discourse-based explanation that neatly captures the attested examples as well as examples of structural mismatch like those discussed in Section 3.3. The important point here is again that Post-Auxiliary Ellipsis is subject to constructionspecific constraints which limit its use with nonlinguistic antecedents.

Nonsentential utterances appear in various nonlinguistic contexts as well. Ginzburg & Miller (2018) distinguish three classes of such nonsentential utterances: sluices (29a), exclamative sluices (29b), and declarative fragments (29c).

<sup>13</sup>But see Abeillé et al. (2016) for experimental results that show no acceptability penalty for voice mismatch in French Right Node Raising.

#### Joanna Nykiel & Jong-Bok Kim

	- b. It makes people "easy to control and easy to handle," he said, "but, God forbid, at what cost!" (Ginzburg & Miller 2018: 96)
	- c. BOBADILLA turns, gestures to one of the other men, who comes forward and gives him a roll of parchment, bearing the royal seal. "My letters of appointment." (COCA FIC: Mov:1492: Conquest of Paradise, 1992)

In addition to being problematic from the licensing point of view, nonsentential utterances like these have been put forward as evidence against the idea that they are underlyingly sentential, because it is unclear what the structure that underlies them would be. There could be many potential sources for these nonsentential utterances (see Culicover & Jackendoff 2005: 306).<sup>14</sup>

# **4 Analyses of nonsentential utterances**

It is worth noting at the outset that the analyses of nonsentential utterances within the framework of HPSG are based on an elaborate theory of dialog (Ginzburg 1994; Ginzburg & Cooper 2004; 2014; Larsson 2002; Purver 2006; Fernández Rovira 2006; Fernández & Ginzburg 2002; Fernández et al. 2007; Ginzburg & Fernández 2010; Ginzburg et al. 2014; Ginzburg 2012; 2013; Kim & Abeillé 2019). Existing analyses of nonsentential utterances go back to Ginzburg & Sag (2000), who recognize declarative fragments as in (30a) and two kinds of sluicing nonsentential utterances: direct sluices as in (30b) and reprise sluices as in (30c) (the relevant fragments are bolded). The difference between direct and reprise sluices lies in the fact that the latter are requests for clarification of any part of the antecedent. For instance, in (30c), the referent of *that* is unclear to the interlocutor.

	- b. "You're waiting," she said softly. "**For what?**" (COCA FIC: Fantasy & Science Fiction, 2016)
	- c. "Can we please not say a lot about that?" "**About what?**" (COCA FIC: The chance, 2014)

<sup>14</sup>This is not to say that a sentential analysis of fragments without linguistic antecedents hasn't been attempted. For details of a proposal involving a "limited ellipsis" strategy, see Merchant (2005a) and Merchant (2010).

### 19 Ellipsis

These different types of nonsentential utterances are derived from the Ginzburg & Sag (2000: 333) hierarchy of clausal types depicted in Figure 1.

Figure 1: Clausal hierarchy for fragments (Ginzburg & Sag 2000: 333)

Nonsentential utterances like declarative fragments (*decl-frag-cl*) are subtypes of *hd-frag-ph* (headed-fragment phrase) and *decl-cl* (declarative clause), while direct sluices (*slu-int-cl*) and reprise sluices (*dir-is-int-cl*) are subtypes of *hd-frag-ph* and *inter-cl* (interrogative clause). The type *slu-int-cl* is permitted to appear in independent and embedded clauses, hence it is underspecified for the head feature IC (independent clause). This specification contrasts with that of declarative fragments and reprise sluices, which are both specified as [IC+]. Ginzburg & Sag (2000: 305) use [IC +] to block declarative fragments and reprise sluices from appearing in embedded clauses (e.g., *A: What do they like? B: \*I doubt bagels*).<sup>15</sup> Ginzburg & Sag (2000: 304) make use of the constraint shown in (31), in which the two contextual attributes SAL-UTT and MAX-qUD play key roles in ellipsis resolu-

<sup>15</sup>This feature specification, however, needs to be remedied for speakers who accept examples like *A: What does Kim take for breakfast? B: Lee says eggs.*

#### Joanna Nykiel & Jong-Bok Kim

tion (we have added information about the MAX-qUD to generate nonsentential utterances):

$$
\begin{bmatrix}
\text{(31)} & \text{Head-Fragment Scorema:}\\ 
& \begin{bmatrix}
\text{CAT} & \text{S} \\
& \begin{bmatrix}
\text{MAX-QUD} \ \text{[PARAMS} \ neset] \\
\text{CAT} \begin{bmatrix}
\text{CAT} & \text{[\Sigma]} \\
\text{CAT-UTT} & \begin{bmatrix}
\text{CAT} & \text{[\Sigma]} \\
\text{CONT} \text{[IND } i
\end{bmatrix}
\end{bmatrix}
\end{bmatrix}
\end{bmatrix}
\rightarrow \begin{bmatrix}
\text{CAT} & \text{[\Sigma]} \text{[HEAD } nonverbal] \\
\text{convert} & \begin{bmatrix}
\text{IND } i
\end{bmatrix}
\end{bmatrix}
\end{bmatrix}
$$

This constructional constraint first allows any non-verbal phrasal category (NP, AP, VP, PP, AdvP) to be mapped onto a sentential utterance as long as it corresponds to a Salient Utterance (SAL-UTT).<sup>16</sup> This means that the head daughter's syntactic category must match that of the SAL-UTT, which is an attribute supplied by the surrounding context as a (sub)utterance of another contextual attribute—the Maximal Question under Discussion (MAX-qUD). The context gets updated with every new question-under-discussion, and MAX-qUD represents the most recent question-under-discussion appropriately specified for the feature PARAMS, whose value is a nonempty set (*neset*) of parameters.<sup>17</sup> SAL-UTT is the (sub)utterance with the widest scope within MAX-qUD. To put it informally, SAL-UTT represents a (sub)utterance of a MAX-qUD that has not been resolved yet. That is, it typically contains an interrogative phrase, an indefinite pronoun or a quantifier, but it can also contain a constituent of any length that has been misunderstood or not understood at all by one of the interlocutors. The feature CAT of SAL-UTT supplies information relevant for establishing morphosyntactic identity with a nonsentential utterance, that is, syntactic category and case information, and (31) requires that a nonsentential utterance match this information.

For illustration, consider the following exchange including a declarative fragment:

(32) A: What did Barry break? B: The mike.

<sup>16</sup>Ginzburg (2012) uses the Dialogue Game Board (DGB) to keep track of all information relating to the common ground between interlocutors. The DGB is also the locus of contextual updates arising from each newly introduced question-under-discussion. See Lücking, Ginzburg & Cooper (2021), Chapter 26 of this volume for more on Dialogue Game Boards.

<sup>17</sup>As defined in Ginzburg & Sag (2000: 304), the feature MAX-qUD is also specified for PROP (proposition) as its value. For the sake of simplicity, we suppress this feature here and further represent the value of MAX-qUD as a lambda abstraction, as in Figure 2. See Ginzburg & Sag (2000: 304) for the exact feature formulations of MAX-qUD.

### 19 Ellipsis

In this dialog, the fragment *The mike* corresponds to the SAL-UTT *what*. Thus the constructional constraint in (31) would license a nonsentential utterance structure like Figure 2.

Figure 2: Structure of a declarative fragment clause

As illustrated in the figure, uttering the *wh*-question in (32A) evokes the QUD asking the value of the variable *i* linked to the object that Barry broke. The nonsentential utterance *The mike* matches that value. The structured dialogue thus plays a key role in the retrieval of the propositional semantics for the nonsentential utterance.

This constructional approach has the advantage that it gives us a way of capturing the problems that Merchant (2001; 2005a) faces with respect to misalignments between preposition stranding under *wh*- and focus movement and the realization of nonsentential utterances as NPs or PPs discussed in Section 3.1. Because the categories of SAL-UTT discussed in Ginzburg & Sag (2000) are limited to nonverbal, SAL-UTTs can surface either as NPs or PPs. As long as both of these syntactic categories are stored in the updated contextual information, a nonsentential utterance's CAT feature will be able to match either of them (See Sag & Nykiel 2011 for discussion of this possibility with respect to Polish and Abeillé & Hassamal 2019 with respect to Mauritian).

Another advantage of this analysis of nonsentential utterances is that the content of MAX-qUD can be supplied by either linguistic or nonlinguistic context. MAX-qUD provides the propositional semantics for a nonsentential utterance and is, typically, a unary question. In the prototypical case, MAX-qUD arises from the

most recent *wh*-question uttered in a given context, as in (32), but can also arise (via accommodation) from other forms found in the context, such as constituents in direct sluicing as in (33), or from a nonlinguistic context as in (34).


The analysis of such direct sluices differs only slightly from that illustrated for (32), and in fact all existing analyses of nonsentential utterances (Sag & Nykiel 2011; Ginzburg 2012; Abeillé et al. 2014; Kim 2015; Abeillé & Hassamal 2019; Kim & Abeillé 2019) are based on the Head-Fragment Schema in (31). The direct sluice would have the structure given in Figure 3. The analyses in Figures 2 and 3 differ

$$
\begin{bmatrix}
\text{CAT} & \begin{bmatrix}
\text{HEAD } v
\end{bmatrix} \\
\begin{bmatrix}
\text{MAX-QID } \lambda\_i [break(i, m)] \\
\text{CAT-UCT} & \begin{Bmatrix}
\begin{bmatrix}
\text{CAT} \box{\Xi} \\
\text{CONT}|
\text{IND } i
\end{Bmatrix}
\end{bmatrix}
\end{bmatrix}
\end{bmatrix}
$$

$$
\begin{aligned}
\text{NP} \\
\begin{bmatrix}
\text{CAT} & \begin{bmatrix}
\Sigma \\
\text{CONT}
\end{bmatrix}
\end{bmatrix}
\end{aligned}
$$

$$
\begin{aligned}
\text{^{\circ}} \end{aligned}
$$

$$
\begin{aligned}
\text{^{\circ}} \end{aligned}
$$

$$
\begin{aligned}
\text{^{\circ}} \end{aligned}
$$

$$
\begin{aligned}
\text{^{\circ}} \end{aligned}
$$

Figure 3: Structure of a sluiced interrogative clause

only in the value of the feature CONT (CONTENT): in the former it is a proposition and in the latter a question.<sup>18</sup>

<sup>18</sup>In-situ languages like Korean and Mandarin allow pseudosluices (sluices with a copula verb), which has lead to proposals that posit cleft clauses as their sources (Merchant 2001). However, Kim (2015) suggests that a cleft-source analysis does not extend to languages like Korean since there is one clear difference between sluicing and cleft constructions: the former allows multiple remnants, while clefts do not license multiple foci. See Kim (2015) for an analysis that differentiates sluicing in embedded clauses (pseudosluices with the copula verb) from direct sluicing in root clauses, as Ginzburg & Sag (2000: 329) do.

### 19 Ellipsis

This construction-based analysis, in which dialogue updating plays a key role in the licensing of nonsentential utterances, also offers a direction for handling the contrast between merger (35a) and sprouting (35b) examples (recall the discussion in Section 3.1).

	- b. A: I heard that the boy painted the wall. B: \*(With) what?

The difference between (35a) and (35b) is that the preceding antecedent clause in the former includes an overt correlate for the nonsentential utterance, but in (35b), all there is is just a PP that is implicitly provided by the argument structure of the verb *paint*. Kim (2015)suggests the following way of analyzing the contrast. Consider the argument structure of the lexeme *paint*:

> 

(36) The lexeme *paint*: PHON *paint* CAT - ARG-ST NP , NP , PP[*with*] CONT (*, ,* )

As represented in (36), the verb *paint* takes three arguments. But note that the PP argument can be realized either as an overt PP or a *pro* expression. In the framework of HPSG, this optionality of an argument to be either realized as a complement or not expressed is represented as the Argument Realization Principle (ARP; Ginzburg & Miller 2018: 101; Abeillé & Borsley 2021: 17, Chapter 1 of this volume):<sup>19</sup>

(37) Argument Realization Principle:

*v-lxm* ⇒ SUBJ 1 COMPS <sup>2</sup> *list noncanon-ss* ARG-ST 1 ⊕ 2 

The ARP tells us that the elements in the ARG-ST that are realized as the SUBJ and COMPS elements, as well as noncanonical elements bearing syntactic-semantic information (including *gap-ss* (*gap-synsem*) and *pro*) in the argument structure, need not be realized in the syntax, permitting mismatch between argument structure and syntactic valence features (see Section 5).

<sup>19</sup>This ARP is an adapted version of Ginzburg & Sag (2000: 171) and Bouma et al. (2001: 11).

#### Joanna Nykiel & Jong-Bok Kim

In accordance with the ARP, there will be two lexical items that correspond to the lexeme in (36), depending on the realization of the optional PP argument:

$$\begin{aligned} \text{(38)} \quad & \begin{bmatrix} \text{PHON} \left< \text{pointed} \right> \\\\ \text{CAT} & \begin{bmatrix} \text{SUBJ} & \langle \text{[\tiny INNP}\_{l} \rangle \\\\ \text{COMPS} & \langle \text{[\tiny ZNP}\_{l}, \text{[\tiny Z]PP}[\text{width}]\_{\text{x}} \rangle \\\\ \text{ARG-ST} & \langle \text{[\tiny INNP}, \text{[\tiny Z]NP}, \text{[\tiny Z]PP} \rangle \end{bmatrix} \end{bmatrix} \end{aligned} \right] $$
 
$$\begin{aligned} \text{(39)} \quad & \begin{bmatrix} \text{PHON} \left< \text{pointed} \right> \\\\ \text{CAT} & \begin{bmatrix} \text{SUBJ} & \langle \text{[\tiny INNP}\_{l} \rangle \\\\ \text{COMP} & \langle \text{[\tiny Z]NP}\_{l} \rangle \end{bmatrix} \end{bmatrix} \\ \text{(30)} \quad & \begin{bmatrix} \text{COMP}\_{l} \left< \text{[\tiny Z]NP}\_{l} \right> \\\\ \text{ARG-ST} & \langle \text{[\tiny Z]NP}\_{l}, \text{[\tiny Z]NP}\_{l}, \text{PP}[\text{[\tiny Z]} \text{with} \langle \text{proj} \rangle\_{\text{x}} \end{bmatrix} \end{aligned} \right] $$

The lexical item with an overt PP complement in (38) would project a merger sentence like (35a) while the one with a covert PP in (39) would license the sprouting example in (35b). Each of these two lexical items would then license the partial VP structures in Figure 4 and Figure 5.

 

Figure 4: Structure of a merger antecedent

Let us consider the nonsentential utterance with the merger antecedent in (35a). In this case, the nonsentential utterance can be either the NP *What?* or the PP *With what?* because of the available Dialog Game Board information triggered by the previous discourse. As can be seen from the structure in Figure 4,

19 Ellipsis

Figure 5: Structure of a sprouting antecedent

the antecedent clause activates not only the PP information but also its internal structure, including the NP within it. The nonsentential utterance can thus be anchored to either of these two, as given in the following:

$$\begin{array}{ll} \text{(40)} & \text{a. } \begin{bmatrix} \text{cTXT} [\text{sAL-UTT} \left\{ \begin{bmatrix} \text{CAT} & \text{PP} [\text{width}]\_{\text{x}} \\ \text{convert} \; \text{point} (i, j, \mathbf{x}) \end{bmatrix} \right\} \end{bmatrix} \\ \text{b. } & \begin{bmatrix} \text{cTXT} [\text{sAL-UTT} \left\{ \begin{bmatrix} \text{CAT} & \text{NP}\_{\mathcal{X}} \\ \text{convert} \; \text{point} (i, j, \mathbf{x}) \end{bmatrix} \right\} \end{array} \end{array}$$

The SAL-UTT in (40a) is the PP *with something*, projecting *With what?* as a wellformed nonsentential utterance in accordance with (31). Since the overt PP also activates the NP object of the preposition, the discourse can supply that NP as another possible SAL-UTT value, as in (40b). This information then projects *What?* as a well-formed nonsentential utterance in accordance with (31). Now consider (35b). Note that in Figure 5 the PP argument is not realized as a complement even though the verb *painted* takes a PP as its argument value. The interlocutor can have access to this ARG-ST information, but nothing further: the PP argument has no further specifications other than being an implicit argument of *painted*. This means that only this implicit PP can be picked up as the SAL-UTT. This is why the sprouting example allows only a PP as a possible nonsentential utterance. Thus the key difference between merger and sprouting examples lies in what the previous discourse activates via syntactic realizations.<sup>20</sup>

The advantages of the discourse-based analyses sketched here thus follow from their ability to capture limited morphosyntactic parallelism between nonsentential utterances and SAL-UTT without having to account for why nonsen-

<sup>20</sup>We owe most of the ideas expressed here to discussions with Anne Abeillé.

#### Joanna Nykiel & Jong-Bok Kim

tential utterances behave differently from constituents of sentential structures. The island-violating behavior of nonsentential utterances is unsurprising on this analysis, as are attested cases of structural mismatch and situationally controlled nonsentential utterances.<sup>21</sup> However, some loose ends still remain. (31) incorrectly rules out case mismatch in languages like Hungarian for speakers that do accept it (see discussion around example (13)).<sup>22</sup>

# **5 Analyses of predicate/argument ellipsis**

The first issue in the analysis of Post-Auxiliary Ellipsis is the status of the elided expression. It is assumed to be a *pro* element due to its pronominal properties (see Lobeck 1995; López 2000; Kim 2003; Aelbrecht & Harwood 2015; Ginzburg & Miller 2018). For instance, Post-Auxiliary Ellipsis applies only to phrasal categories (42), with the exception of pseudogapping as shown in (41); it can cross utterance boundaries (43); it can override island constraints (44)–(45); and it is subject to the Backwards Anaphora Constraint (46)–(47).


One way to account for Post-Auxiliary Ellipsis closely tracks analyses of *pro*drop phenomena. We do not need to posit a phonologically empty pronoun if a level of argument structure is available where we can encode the required

<sup>21</sup>The rarity of nonsentential utterances with nonlinguistic antecedents can be understood as a function of how hard or how easily a situational context can give rise to a MAX-qUD and thus license ellipsis. See Miller & Pullum (2014) for this point with regard to Post-Auxiliary Ellipsis.

<sup>22</sup>See, however, Kim (2015) for a proposal that introduces a case hierarchy specific to Korean to explain limited case mismatch in this language.

### 19 Ellipsis

pronominal properties (see Ginzburg & Sag 2000: 330). As we have seen, the Argument Realization Principle in (37) allows an argument to be a noncanonical *synsem* such as *pro* which need not be mapped onto COMPS. For instance, the auxiliary verb *can*, bearing the feature AUX, has a *pro* VP as its second argument in a sentence like *John can't dance, but Sandy can*, that is, this VP is not instantiated as a syntactic complement of the verb.<sup>23</sup> This possibility is represented formally in (48) (see Kim 2003; Ginzburg & Miller 2018):

$$\begin{array}{c} \text{(48)} \quad \begin{array}{l} \text{Lexical description for } can: \\ \begin{bmatrix} \text{v-}l\text{xm} \\ \text{PHON} \end{bmatrix} \text{(}can \text{ )} \\\\ \begin{bmatrix} \text{HEAD} & \begin{bmatrix} \text{vFORM} \text{ }\hat{f}n \\ \text{AUX} & + \end{bmatrix} \\\\ \text{CAT} & \begin{bmatrix} \text{SUBJ} & \langle \text{[I]} \rangle \\ \text{COMPS} & \langle \rangle \\ \text{COMPS} & \langle \rangle \end{bmatrix} \\\\ \text{ARG-ST } \left\langle \text{[I]NP, VP[pro]} \right\rangle \end{array} \end{array} \Bigg| \begin{array}{l} \text{(}can \text{)} \end{array} \right.$$

The auxiliary in (48) will then project a structure like the one in Figure 6. The head daughter's COMPS list is empty because the second element on the ARG-ST list is a *pro*. 24

We saw in Section 3.3 that Post-Auxiliary Ellipsis does not require structural identity with its antecedent, which is supplied by the surrounding context. Therefore, ellipsis resolution is not based on syntactic reconstruction in HPSG analyses, but rather on structured discourse information (see Ginzburg & Sag 2000: 295). The *pro* analysis outlined above expects structural mismatches (and island violations), because the relevant antecedent information is the information that the Dialog Game Board provides via the MAX-qUD in each case, and hence no structural-match requirement is enforced on Post-Auxiliary Ellipsis.<sup>25</sup> This means in turn that HPSG analyses of Post-Auxiliary Ellipsis do not face the problem of having to rule out, or rule in, cases of structural mismatch or nonlinguistic

<sup>23</sup>The rich body of HPSG work on English auxiliaries takes them to be not special Infl categories, but verbs having the AUX value +. See Kim (2000); Kim & Sag (2002); Sag et al. (2003; 2020); Kim & Michaelis (2020).

<sup>24</sup>The same line of analysis could be extended to Null Complement Anaphora, which has received relatively little attention in modern syntactic theory, including in HPSG. However, Null Complement Anaphora is sensitive only to a limited set of main verbs and its exact nature remains controversial.

<sup>25</sup>In the derivational analysis of Merchant (2013), cases of structural mismatch are licensed by the postulation of the functional projection VoiceP above an IP: the understood VP is linked to its antecedent under the IP.

Figure 6: Structure of a Verb Phrase Ellipsis

antecedents, because their acceptability can be captured as reflecting discoursebased and construction-specific constraints on Post-Auxiliary Ellipsis.

# **6 Analyses of non-constituent coordination and gapping**

Constructions such as gapping, Right Node Raising, and Argument Cluster Coordination have also often been taken to be elliptical constructions. Each of these constructions has received relatively little attention in the research on elliptical constructions, possibly because of their syntactic and semantic complexities. In this section, we briefly review HPSG analyses of these three constructions, leaving more detailed discussion to Abeillé & Chaves 2021, Chapter 16 of this volume and references therein.<sup>26</sup>

# **6.1 Gapping**

Gapping allows a finite verb to be unexpressed in the non-initial conjuncts, as exemplified below.

<sup>26</sup>We also leave out discussion of HPSG analyses for pseudogapping: readers are referred to Miller (1992); Kim & Nykiel (2020) and Abeillé (2021: Section 4), Chapter 12 of this volume.

19 Ellipsis

	- b. Kim can play the guitar, and Lee the violin.

HPSG analyses of gapping fall into two kinds: one kind draws on Beavers & Sag's (2004) deletion-like analysis of non-constituent coordination (Chaves 2009) and the other on Ginzburg & Sag's (2000) analysis of nonsentential utterances (Abeillé et al. 2014).<sup>27</sup> The latter analyses align gapping with analyses of nonsentential utterances, as discussed in Section 4, more than with analyses of nonconstituent coordination, and for this reason gapping could be classified together with other nonsentential utterances. We use the analysis in Abeillé et al. (2014) for illustration below.

Abeillé et al. (2014), focusing on French and Romanian, offer a constructionand discourse-based HPSG approach to gapping where the second headless gapped conjunct is taken to be an nonsentential utterance. Their analysis places no syntactic parallelism requirements on the first conjunct and the gapped conjunct, given English data like (50) (note that the bracketed phrases differ in syntactic category).

(50) Pat has become [crazy]AP and Chris [an incredible bore]NP. (Abeillé et al. 2014: 248)

Instead of requiring syntactic parallelism between the two clauses, their analysis limits gapping remnants to elements of the argument structure of the verbal head present in the antecedent (i.e., the leftmost conjunct) and absent from the rightmost conjunct, which reflects the intuition articulated in Hankamer (1971). This analysis thus also licenses gapping remnants with implicit correlates, as illustrated in the following Italian example, where the subject is implicit in the leftmost conjunct and overt in the rightmost conjunct (Abeillé et al. 2014: 251).<sup>28</sup>

(51) Mangio eat.1SG la DET pasta pasta e and Giovanni Giovanni il DET riso. rice 'I eat pasta and Giovanni eats rice.'

The subject in the leftmost conjunct in (51) would be analyzed as a noncanonical *synsem* of type *pro* and the correlate for the remnant *Giovanni*.

<sup>27</sup>For a semantic approach to gapping, the reader is referred to Park et al. (2019), who offer an analysis of scope ambiguities under gapping where the syntax assumed is of the nonsentential utterance type and the semantics is cast in the framework of Lexical Resource Semantics. For more on Lexical Resource Semantics see Koenig & Richter (2021: Section 6.2), Chapter 22 of this volume.

<sup>28</sup>Gapping is possible outside coordination constructions like comparatives as well as in subordinate clauses. See Abeillé & Chaves (2021: Section 7), Chapter 16 of this volume.

#### Joanna Nykiel & Jong-Bok Kim

Abeillé et al. (2014) adopt two key assumptions in their analysis: (a) coordination phrases are nonheaded constructions in which each conjunct shares the same valence (SUBJ and COMPS) and nonlocal (SLASH) features, while its head (HEAD) value is not fixed but contains an upper bound (supertype) to accommodate examples like (50), and (b) gapping is a special coordination construction in which the first (full) clause (and not the remaining conjuncts) shares its HEAD value with the mother, and some symmetric discourse relation holds between the conjuncts. To illustrate, the gapped conjunct *Chris an incredible bore* in (50) is a nonsentential utterance with a cluster phrase daughter consisting of two NP daughters, as represented by the simplified structure in Figure 7.

Pat has become crazy and Chris an incredible bore

Figure 7: Simplified structure of a gapping construction

Abeillé et al. (2014) analyze gapping remnants as forming a cluster phrase whose mother has an underspecified syntactic category (this information is represented by the CLUSTER head feature in Figure 7 and in the constraint in (52) below).<sup>29</sup> This cluster phrase then serves as the head daughter of a head-fragment phrase, whose syntactic category is also underspecified. This means that there is no unpronounced verbal head in the phrase to which gapping remnants belong. The meanings of the gapping remnants are computed from the meaning of

<sup>29</sup>The notion of a cluster refers to any sequence of dependents and was introduced in Mouret (2006)'s analysis of Argument Cluster Coordination. For more detail, see Abeillé & Chaves 2021: 760–763, Chapter 16 of this volume on coordination and Kubota (2021: 1340), Chapter 29 of this volume on the semantics of Argument Cluster Coordination.

### 19 Ellipsis

the leftmost nonelliptical verbal conjunct. In sum, the nonsentential utterance that consists of the gapped conjunct in Figure 7 has a single daughter, a cluster phrase, which in turn has two daughters.

The constraint in (52) illustrates how syntactic parallelism between gapping remnants and their correlates in the leftmost conjunct is operationalized. We saw above that Abeillé et al. (2014) ensure that gapping remnants are arguments of a verbal head located in the leftmost conjunct. They do so by adopting the contextual attribute SAL-UTT, which is introduced for all nonsentential utterances, as in (52) (Abeillé et al. 2014: 259) (for the definition of SAL-UTT, see Section 4).

```
(52) Syntactic constraints on head-fragment-phrase:
     head-fragment-phrase ⇒

      CNXT|SAL-UTT  HEAD H1
                       MAJOR +

                                 , …, 
                                      HEAD H
                                      MAJOR +

      CAT|HEAD|CLUSTER
                          -
                          HEAD H1

                                    , …, -

                                         HEAD H
```
Each list member of the SAL-UTT unifies its HEAD value with the corresponding cluster element, while the feature MAJOR makes each member of the SAL-UTT a major constituent functioning as a dependent of some verbal projection. This analysis does not reconstruct a syntactic gapped clause and predicts that gapping may appear in contexts where a full finite clause cannot, as illustrated in (53).

(53) Bill wanted to meet Jane as well as Jane (\*wanted to invite) him. (Abeillé et al. 2014: 242)

With syntactic parallelism between the first and the gapped conjuncts captured this way, Abeillé et al. (2014) also allow gapping remnants to appear in a different order than their correlates in the leftmost conjunct (54) (see Sag et al. 1985: 156–158), however limited this possibility is in gapping.

(54) A policeman walked in at 11, and at 12, a fireman.

This ordering flexibility is licensed as long as some symmetric discourse relation holds between the two conjuncts. Abeillé et al. (2014) localize this symmetric discourse relation to the BACKGROUND contextual feature of the Gapping Construction, which is a subtype of coordination.

# **6.2 Right Node Raising**

In typical examples of Right Node Raising, as shown below, the element to the immediate right of a parallel structure is shared with the left conjunct:

#### Joanna Nykiel & Jong-Bok Kim

(55) a. Kim prepares and Lee eats [the pasta].

b. Kim played and Lee sang [some Rock and Roll songs at Jane's party].

The bracketed shared material can be either a constituent, as in (55a), or a nonconstituent, as in (55b).

Right Node Raising has consistently attracted HPSG analyses involving silent material (a detailed discussion of analyses of Right Node Raising can be found in Abeillé & Chaves 2021: 760–763, Chapter 16 of this volume). All existing analyses of Right Node Raising (Abeillé et al. 2016; Beavers & Sag 2004; Chaves 2014; Crysmann 2008; Shiraïshi et al. 2019; Yatabe 2001; 2012) agree on this point, although some of them propose more than one mechanism for accounting for different kinds of non-constituent coordination (Chaves 2014; Yatabe 2001; 2012; Yatabe & Tam 2021). One strand of research within the Right Node Raising literature adopts a linearization-based approach employed more generally in analyses of non-constituent coordination (NCC) (see Yatabe 2001; 2012, for a general introduction to order domains see Müller 2021a: Section 6, Chapter 10 of this volume) and another proposes a deletion-like operation (Abeillé et al. 2016; Chaves 2014; Shiraïshi et al. 2019).

The kind of material that may be Right Node Raised and the range of structural mismatches permitted between the left and right conjuncts have been the subject of recent debate.<sup>30</sup> For instance, Chaves (2014: 839–840) demonstrates that, besides more typical examples like (55), there is a range of phenomena classifiable as Right Node Raising that exhibit various argument-structure mismatches as in (56a,b), and that can target material below the word level as in (56c,d).

	- b. Never let me—or insist that I—[pick the seats].
	- c. We ordered the hard- but they got us the soft-[cover edition].
	- d. Your theory under- and my theory over[generates].

Furthermore, Right Node Raising can target strings that are not subject to any known syntactic operations, such as rightward movement, as illustrated in (57) (Chaves 2014: 865).

(57) a. I thought it was going to be a good but it ended up being a very bad [reception].

<sup>30</sup>Although we refer to the material on the left and right as conjuncts, it is been known since Hudson (1976; 1989) that Right Node Raising extends to other syntactic environments than coordination (see Chaves 2014).

19 Ellipsis

b. Tonight a group of men, tomorrow night he himself, [would go out there somewhere and wait].

Right Node Raised material can also be discontinuous, as in (58) (Chaves 2014: 868; Whitman 2009: 238–240).

	- b. The blast upended and nearly sliced [an armored Chevrolet Suburban] in half.

This evidence leads Chaves (2014) to propose that Right Node Raising is a nonuniform phenomenon, comprising extraposition, VP- or N<sup>0</sup> -ellipsis, and true Right Node Raising. Of the three, only true Right Node Raising should be accounted for via the mechanism of optional surface-based deletion that is sensitive to morph form identity and targets any linearized strings, whether constituents or otherwise.<sup>31</sup> Chaves' (2014: 874) constraint licensing true Right Node Raising is given informally in (59) (where means a morphophonological constituent, <sup>∗</sup> the Kleene star (operator), and <sup>+</sup> the Kleene plus (operator)):

(59) Backward Periphery Deletion Construction: Given a sequence of morphophonologic constituents + 1 + 2 + 3 + 4 ∗ 5 , then output + 1 + 3 + 4 ∗ 5 iff + 2 and + 4 are identical up to morph forms.

(59) takes the morphophonology of a phrase to be computed as the linear combination of the morphophonologies of the daughters, allowing deletion to apply locally.<sup>32</sup>

Another deletion-based analysis of Right Node Raising is due to Abeillé et al. (2016); Shiraïshi et al. (2019), differing from Chaves (2014) in terms of identity conditions on deletion. Abeillé et al. (2016) argue for a finer-grained analysis of French Right Node Raising without morphophonological identity. Their empirical evidence reveals a split between functional and lexical categories in French

<sup>31</sup>Whenever Right Node Raising can instead be analyzed as either VP or N<sup>0</sup> -ellipsis or extraposition, Chaves (2014) proposes separate mechanisms for deriving them. These are the direct interpretation line of analysis described in the previous sections for nonsentential utterances and predicate/argument ellipsis and an analysis employing the feature EXTRA to record extraposed material along the lines of Kim & Sag (2005) and Kay & Sag (2012). See also Borsley & Crysmann (2021: Section 8.1), Chapter 13 of this volume on extraposition and the EXTRA feature.

<sup>32</sup>For further detail on linearization-based analyses of Right Node Raising, the interested reader is referred to Yatabe (2001; 2012) and to Müller 2021a: Section 6, Chapter 10 of this volume for details of linearization-based approaches in general.

#### Joanna Nykiel & Jong-Bok Kim

such that the former permit mismatch between the two conjuncts (where determiners or prepositions differ) under Right Node Raising, while the latter do not. Shiraïshi et al. (2019) provide further corpus and experimental evidence that morphophonological identity is too strong a constraint on Right Node Raising, given the range of acceptable mismatches between the verbal forms of the material missing from the left conjunct and those of the material that is shared between both conjuncts.

# **6.3 Argument Cluster Coordination**

Argument Cluster Coordination is a type of non-constituent coordination (NCC), as illustrated in (60):

	- b. John gave [Mary a book] and [Jane a record].

As for the treatment of Argument Cluster Coordination, the existing HPSG analyses have articulated two main views: ellipsis (Yatabe 2001; Crysmann 2008; Beavers & Sag 2004) and non-standard constituents (Mouret 2006). For discussion of the nonelliptical view, which takes Argument Cluster Coordination to be a special type of coordination, we refer the reader to Abeillé & Chaves 2021, Chapter 16 of this volume and references therein. Here we just focus on the ellipsis view, which better fits this chapter.

The ellipsis analysis set forth by Beavers & Sag (2004) gains its motivation from examples like (61).

	- b. Jan wanted to study medicine when he was 11, [law when he was 13], and to study nothing at all when he was 18.

As pointed out by Beavers & Sag (2004), such examples challenge non-ellipsis analyses within the assumption that only constituents of like category can coordinate.<sup>33</sup> The status of the bracketed conjuncts in (61) is quite questionable, since they are not VPs like the other two fellow conjuncts. Beavers & Sag's (2004) proposal is to treat such examples as standard VP coordination with ellipsis of the verb in the second conjunct, as given in (62).

<sup>33</sup>As discussed in Abeillé & Chaves (2021: Section 6), Chapter 16 of this volume and references therein, there are numerous examples (e.g., *Fred became wealthy and a Republican*) where unlike categories are coordinated.

### 19 Ellipsis

(62) Jan [[travels to Rome tomorrow], [[travels] to Paris on Friday], and [will fly to Tokyo on Sunday]]].

Beavers & Sag (2004) further adopt the DOM list machinery proposed as part of the linearization theory (see Crysmann 2003 for this proposal), and allow some elements in the daughters' DOM lists to be absent from the mother's DOM list (Yatabe 2001; Crysmann 2003).<sup>34</sup> This idea is encoded in the Coordination Schema, given in (63), which is a simplified version of the one in (Beavers & Sag 2004: 27).<sup>35</sup>

(63) Syntactic constraints on *cnj-phrase* (adapted from Beavers & Sag 2004: 27): *cnj-phrase* ⇒ DOM A ⊕ B<sup>1</sup> ⊕ C ⊕ B<sup>2</sup> ⊕ D DTRS \*- DOM A ⊕ B<sup>1</sup> - *ne-list* ⊕ D , - DOM C - *conj* ⊕ A ⊕ B<sup>2</sup> - *ne-list* ⊕ D + 

As specified in 63, there are two constituents contributing a DOM value. The mother DOM value has the potentially empty material A from the left conjunct (the corresponding material in the right conjunct is elided), a unique element B<sup>1</sup> from the left conjunct, the coordinator C , a unique element B<sup>2</sup> from the right conjunct, and some material D from the right conjunct (the corresponding material in the left conjunct is elided). (63) licenses various types of coordination. For instance, when A is empty, it licenses examples like *Kim and Pat*, but when A is non-empty, it licenses examples like *John gave a book to Mary and a record to Jane*. When both A and D are non-empty, it allows examples like (62). The content of the DOM list consists of prosodic constituents (i.e., constituents with no information about their internal morphosyntax) and this offers a way of accounting for coordination of noncanonical constituents as a type of ellipsis.

# **7 Summary**

This chapter has reviewed three types of ellipsis—nonsentential utterances, predicate ellipsis, and non-constituent coordination—which correspond to three kinds of analysis within HPSG. The pattern that emerges from this overview is that HPSG favors the "what you see is what get" approach to ellipsis and makes limited use of deletion-like operations, accounting for a wider range of corpus and experimental data than derivation-based approaches common in the Minimalist literature.

<sup>34</sup>For detailed discussion of the feature DOM, see Müller (2021a: Section 6), Chapter 10 of this volume.

<sup>35</sup>For simplicity, we represent only the DOM value, suppressing all the other information and further add the parentheses for A and D . For the exact formulation, see Beavers & Sag (2004).

Joanna Nykiel & Jong-Bok Kim

# **Abbreviations**

MAX-QUD Maximal-Question-under-Discussion SAL-UTT Salient Utterance

# **Acknowledgments**

We thank Anne Abeillé, Jean-Pierre Koenig, and Stefan Müller for substantive discussion and helpful suggestions. We also thank Yusuke Kubota for helpful comments.

# **References**


### 19 Ellipsis


Joanna Nykiel & Jong-Bok Kim


### 19 Ellipsis


#### Joanna Nykiel & Jong-Bok Kim

mental and Corpus-based Approaches to Ellipsis 2020, Jul 15–16, 2020. http: //drehu.linguist.univ-paris-diderot.fr/ecbae-2020/fichiers/abstracts/hardt.pdf (30 March, 2021).


### 19 Ellipsis


#### Joanna Nykiel & Jong-Bok Kim


### 19 Ellipsis


Joanna Nykiel & Jong-Bok Kim


19 Ellipsis

cations. http://csli- publications.stanford.edu/HPSG/2011/sag- nykiel.pdf (15 September, 2020).


#### Joanna Nykiel & Jong-Bok Kim

*ifornia, Berkeley, 22–23 July, 2000*, 325–344. Stanford, CA: CSLI Publications. http://csli-publications.stanford.edu/HPSG/2000/ (29 January, 2020).


# **Chapter 20**

# **Anaphoric binding**

# Stefan Müller

# Humboldt-Universität zu Berlin

This chapter is an introduction to the Binding Theory assumed within HPSG. While it was inspired by work on Government & Binding (GB), a key insight of HPSG's Binding Theory is that, contrary to GB's Binding Theory, reference to tree structures alone is not sufficient and reference to the syntactic level of argument structure is required. Since argument structure is tightly related to semantics, HPSG's Binding Theory is a mix of aspects of thematic Binding Theories and entirely configurational theories. This chapter discusses the advantages of this new view and its development into a strongly lexical binding theory as a result of shortcomings of earlier approaches. The chapter also addresses so-called exempt anaphors, that is, anaphors not bound inside of the clause or another local domain.

# **1 Introduction**

Binding Theories deal with questions of semantic identity and agreement of coreferring items. For example, the reflexives in (1) must corefer, and agree in gender, with a coargument:

	- b. \* Peter thinks that Mary likes himself∗/∗/∗ .
	- c. \* Mary thinks that Peter likes herself∗/∗/∗ .
	- d. Mary thinks that Peter likes himself∗//∗ .

The indices show what bindings are possible and which ones are ruled out. For example, in (1a), *herself* cannot refer to *Peter*, it can refer to *Mary* and it cannot refer to some discourse referent that is not mentioned in the sentence (indicated by the index ). Binding of *himself* to *Mary* is ruled out in (1b), since *himself*

### Stefan Müller

has an incompatible gender. Expressions like *Mary*, *the morning star*, *Venus*, *fear* are so-called *referring expressions* (r-expressions, Chomsky 1981: 102). They refer to an entity in the discourse. Speakers may use pronouns or reflexives to refer to the same entity. This is coreference. Further, several r-expressions may refer to the same entity. For example, *morning star*, *evening star* and *Venus* refer to the same object. As was mentioned above, English uses grammatical means to help resolving the reference of pronouns and reflexives. Pronouns can also be used to relate r-expressions that do not refer. For example, coindexing can be established with all kinds of nominal expressions, including quantified ones and negated NPs like *no animal* (see Bach & Partee 1980: 128–129).

(2) No animal saw itself .

So binding is not coreference.

At first look it may seem possible to account for the binding relations of reflexives at the semantic level with respect to thematic roles (Jackendoff 1972: Section 4.10; Wilkins 1988; Williams 1994: Chapter 6): it seems to be the case that reflexives and their antecedents have to be semantic arguments of the same predicate.<sup>1</sup> For examples like (1), a theory assuming that reflexives and their antecedents have to fill a semantic role of the same head makes the right predictions, since the reflexive is the undergoer of *likes* and the only possible antecedent is the actor of *likes*. <sup>2</sup> However, there are raising predicates like *believe* that do not assign semantic roles to their objects but that nevertheless allow coreference of the raised element and the subject of *believe* (Manning & Sag 1998: 128):<sup>3</sup>

(3) John believes himself to be a descendant of Beethoven.

The fact that *believes* does not assign a semantic role to its object is confirmed by the possibility of embedding predicates with an expletive subject under *believe*: 4

(4) Kim believed there to be some misunderstanding about these issues.

<sup>1</sup>See Riezler (1995) for a way to formalize this in HPSG. See Reinhart & Reuland (1993) for an approach to Binding mixing constraints at the semantic and syntactic level. Kubota (2021: Section 4.3), Chapter 29 of this volume discusses an approach to binding operating on semantic formulae.

<sup>2</sup>See Dowty (1991) and Van Valin (1999) on semantic roles. Dowty suggested role labels like proto-agent and proto-patient and Van Valin proposed the labels actor and undergoer. We use the latter here. See also Davis, Koenig & Wechsler (2021: Section 4.1), Chapter 9 of this volume on actor and undergoer and linking in HPSG.

<sup>3</sup>See Pollard & Sag (1994: Chapter 3.5) and Abeillé (2021), Chapter 12 of this volume on raising. See also Reinhart & Reuland (1993: 679) on Binding Theory and raising.

<sup>4</sup>The example is from Pollard & Sag (1994: 137). See the sources cited above for further discussion.

20 Anaphoric binding

So, it really is the clause or – to be more precise – some syntactically defined local domain in which reflexive pronouns have to be bound, provided the structure is such that an appropriate antecedent could be available in principle.<sup>5</sup> In cases like (5), no antecedent is available within the clause and in such situations, a reflexive may be bound by an element outside the clause.

(5) John was going to get even with Mary. That picture of himself in the paper would really annoy her, as would the other stunts he had planned.<sup>6</sup>

Reflexives without an element that could function as a binder in a certain local domain are regarded as exempt from Binding Theory. Section 2.3 deals with socalled exempt anaphors in more detail.

Personal pronouns cannot bind an antecedent within the same domain of locality in English:

	- b. Peter thinks that Mary likes him/∗/ .
	- c. Mary thinks that Peter likes her/∗/ .
	- d. Mary thinks that Peter likes him∗/∗/ .

As the examples show, the pronouns *her* and *him* cannot be coreferent with the subject of *likes*. If a speaker wants to express coreference, he or she has to use a reflexive pronoun as in (1).

Interestingly, the binding of pronouns is less restricted than that of reflexives, but this does not mean that anything goes. For example, a pronoun cannot bind a full referential NP if the NP is embedded in a complement clause and the pronoun is in the matrix clause:

	- b. He∗/∗/ thinks that Peter likes Mary .

The sentences discussed so far can be assigned a structure like the one in Figure 1. Chomsky (1981: Section 3.2; 1986: Section 3) suggested that tree-configurational properties play a role in accounting for binding facts. He uses the notion of c(onstituent)-command going back to work by Reinhart (1976). c-command is

<sup>5</sup>Another argument against a Binding Theory relying exclusively on semantics involves different binding behavior in active and passive sentences: since the semantic contribution is the same for active and passive sentences, the difference in binding options cannot be explained in semantics-based approaches. Binding and passive is discussed more thoroughly in Section 4. For a general discussion of thematic approaches to binding see Pollard & Sag (1992: Section 8) and Pollard & Sag (1994: Section 6.8.2).

<sup>6</sup>Pollard & Sag (1994: 270)

Stefan Müller

Figure 1: Tree configuration of examples for binding

a relation that holds between nodes in a tree. According to one definition, a node c-commands its sisters and the constituents of its sisters.<sup>7</sup>

To take an example, the NP node of *John* c-commands all other nodes dominated by S. The V of *thinks* c-commands everything within the CP, including the CP node; the C of *that* c-commands all nodes in S, including also S; and so on. The CP c-commands the *think*-V, and the *likes him*-VP c-commands the *Paul*-NP. By definition, a Y binds Z in the case that Y and Z are coindexed and Y c-commands Z. One precondition for being coindexed (in English) is that the person, number and gender features of the involved items are compatible, since these features are part of the index.

<sup>7</sup> "Node A c(onstituent)-commands node B if neither A nor B dominates the other and the first branching node which dominates A dominates B." Reinhart (1976: 32)

Chomsky (1986) uses another definition that allows one to go up to the next maximal projection dominating A. As of 2020-02-25 the English and German Wikipedia pages for c-command have two conflicting definitions of c-command. The English version follows Sportiche et al. (2013: 168), whose definition excludes c-command between sisters: "Node X c-commands node Y if a sister of X dominates Y."

#### 20 Anaphoric binding

Now, the goal is to find restrictions that ensure that English reflexives are bound locally, that personal pronouns are not bound locally and that r-expressions like proper names and full NPs are not bound by other expressions (anaphors, personal pronouns or r-expressions). The conditions that were developed for GB's Binding Theory are complex. They also account for the binding of traces that are the result of moving elements by transformations (Chomsky 1981, but given up in Chomsky 1986). While it is elegant to subsume filler-gap relations (and other relations between moved items and their traces) under a general Binding Theory, proponents of HPSG think that coindexed semantic indices and fillergap dependencies are crucially different.<sup>8</sup> Where traces (if they are assumed at all) can occur is restricted by other components of the theory. For an overview of the treatment of nonlocal dependencies in HPSG, see Borsley & Crysmann (2021), Chapter 13 of this volume.

I will not go into the details of the Binding Theory in Mainstream Generative Grammar (MGG)<sup>9</sup> , but I will give a verbatim description of the ABC of Binding Theory (ignoring movement). Chomsky distinguishes between so-called rexpressions, personal pronouns, reflexives and reciprocals. The latter two are subsumed under the term *anaphor*. Principle A says that an anaphor must be bound in a certain local domain. Principle B says that a pronoun must not be bound in a certain local domain, and Principle C says that a referential expression must not be bound by another item at all.

Some researchers have questioned whether syntactic principles like Chomsky's Principle C and the respective HPSG variant should be formulated at all, and it has been suggested to leave an account of the unavailability of bindings like the binding of *he* to full NPs in (7) to pragmatics (Bolinger 1979: 302; Bresnan 2001: 227–228; Bouma, Malouf & Sag 2001: 44). Walker (2011: Section 6) discussed the claims in detail and showed why Principle C is needed and how data that was considered problematic for syntactic Binding Theories can be explained in a configurational Binding Theory in HPSG. So, while it ultimately may turn out

<sup>8</sup>The HPSG treatment of relative and interrogative pronouns in each of those types of clause is special, but this is due to their special distribution: they have to be part of a phrase that is initial in the relative or interrogative clause. See Arnold & Godard (2021), Chapter 14 of this volume on relative clauses in HPSG. Bredenkamp (1996: Section 7.2.3) was an early suggestion about modeling binding relations of personal pronouns and anaphors by the same means as fillergap dependencies. I will discuss approaches relying on HPSG's general apparatus for nonlocal dependencies without assuming that the phenomena are of the same kind in Section 6.

<sup>9</sup> I follow Culicover & Jackendoff (2005: 3) in using the term *Mainstream Generative Grammar* when referring to work in Government & Binding (Chomsky 1981) or Minimalism (Chomsky 1995).

### Stefan Müller

that Principle C should be dropped from Binding Theory (Varaschin, Culicover & Winkler 2021), the following discussion includes a discussion of Principle C in its various forms.

# **2 A non-configurational Binding Theory**

As was noted above, English pronouns and reflexives have to agree with their antecedents in gender. In addition, there is agreement in person and number. This is modeled by assuming that referential units come with a referential index in their semantic representation.<sup>10</sup> (On referential indices and coindexation vs. coreference, see Bach & Partee 1980: Section 6.3.) The following makeup of the semantic contribution of nominal objects is assumed.

(8) Representation of semantic information contributed by nominal objects adapted from Pollard & Sag (1994: 248):

> 

$$
\begin{bmatrix}
nom-obj\\\\ \begin{bmatrix}
index\\ \text{PER} & per\\ \text{NUM} & num\\ \text{GEN} & gen
\end{bmatrix}
\end{bmatrix}
$$
 
$$
\begin{bmatrix}
\text{RESTRCIIONs} \ set \ of \, restrictions\,\,\, \, \, \, \, \, \, \, \, \, \, \, \, \,
\, \, \, \, \, \,
$$

Every nominal object comes with a referential index with person, number and gender information and a set of restrictions. In the case of pronouns, the set of restrictions is the empty set, but for nouns like *house*, the set of restrictions

	- b. Tom told Mary that they should leave. (Bresnan 1982: 396)

See Abeillé & Chaves (2021: Section 4.3), Chapter 16 of this volume for more on conjoined antecedents. Anaphoric agreement is also discussed in Chapter 6 (Wechsler 2021: Section 4.1). The approach discussed in Section 6 is powerful enough to introduce additional indices for binding that are not related to individual nodes in a tree like the NP nodes for *Paul* and *Mary* but that represent the set of the combined indices for *Paul* and *Mary*.

See Levine (2010) for special cases, for example, singular gender-neutral *they* (p. 275):

(ii) I know someone who thinks they are the greatest thing since sliced bread.

See also Wechsler (2021: Section 4), Chapter 6 of this volume on the distinction between concord and index agreement.

<sup>10</sup>There is also resolved agreement in the case of (conjoined or split) antecedents with different gender/person:

#### 20 Anaphoric binding

would contain something like *house*<sup>0</sup> () where is the referential index of the noun *house*. Nominal objects can be of various types. The types are ordered hierarchically in the inheritance hierarchy given in Figure 2. Nominal objects

Figure 2: Type hierarchy of nominal objects

(*nom-obj*) can either be pronouns (*pron*) or non-pronouns (*npro*). Pronouns can be anaphors (*ana*) or personal pronouns (*ppro*), and anaphors are divided into reflexives (*refl*) and reciprocals (*recp*).

HPSG's Binding Theory differs from GB's Binding Theory in referring less to tree structures and more to the notion of obliqueness of arguments of a head. The syntactic arguments of a head are represented in a list called the argument structure list (Davis, Koenig & Wechsler 2021, Chapter 9 of this volume). This list is the value of the feature ARG-ST. The ARG-ST elements are descriptions of arguments of a head containing syntactic and semantic properties of the selected arguments but not their daughters. So, they are not complete signs but *synsem* objects. See Abeillé & Borsley (2021), Chapter 1 of this volume for more on the general setup of HPSG theories. The list elements are ordered with respect to their obliqueness, the least oblique element being the first element (Pollard & Sag 1992: 266):<sup>11</sup>

# (9) SUBJECT *>* PRIMARY *>* SECONDARY *>* OTHER COMPLEMENTS OBJECT OBJECT

This order was suggested by Keenan & Comrie (1977: 66). It corresponds to the level of syntactic accessibility of grammatical functions. Elements higher in this

$$\begin{array}{cccc} \text{(i)} & \text{SUBJECT} > \text{DIECCT} > \text{INDIRECT} > \text{OBLIQUE} > \text{GENITIVE} > \text{OBJECTS OF} \\ & \text{OBJECT} & \text{OBJECT} = \text{O} \\ \end{array}$$

<sup>11</sup>While Pollard & Sag (1987: 120) use Keenan & Comrie's (1977) version of the Obliqueness Hierarchy in (i), they avoid the terms *direct object* and *indirect object* in Pollard & Sag (1992: 266, 280) and Pollard & Sag (1994: 24).

### Stefan Müller

hierarchy are less oblique and can participate more easily in syntactic constructions, such as reductions in coordinated structures (Klein 1985: 15), topic drop (Fries 1988), non-matching free relative clauses (Bausewein 1991: Section 3; Pittner 1995: 195; Müller 1999a: 60–62), passive and relativization (Keenan & Comrie 1977: 96, 68) and depictive predication (Müller 2008: Section 2). In addition, Pullum (1977) and Pollard & Sag (1987: 174) argued that this hierarchy plays a role in constituent order. And, of course, it was claimed to play an important role in Binding Theory (Grewendorf, 1983: 176; 1985: 160; 1988: 60; Pollard & Sag 1994: Chapter 6).

The ARG-ST list plays an important role for linking syntax to semantics (Davis, Koenig & Wechsler 2021, Chapter 9 of this volume). For example, the index of the subject and the object of the verb *like* are linked to the respective semantic roles in the representation of the verb:<sup>12</sup>

(10) *like*:


Much more can be said about linking in HPSG, and the interested reader is referred to Wechsler (1995), Davis (2001), Davis & Koenig (2000) and Davis, Koenig & Wechsler (2021), Chapter 9 of this volume.

After these introductory remarks, I now turn to the details of HPSG's Binding Theory. Figure 3 shows a version of Figure 1 including ARG-ST information. The main points of HPSG's Binding Theory can be discussed with respect to this simple figure: (non-exempt) anaphors have to be bound locally. The definition of the domain of locality is rather simple. One does not have to refer to tree configurations, since all arguments of a head are represented locally in a list. Simplifying a bit, reflexives and reciprocals must be bound to elements preceding them in the ARG-ST list (but see Section 2.3 for so-called exempt anaphors) and a pronoun like *him* must not be bound by a preceding element in the same ARG-ST list.

To be able to specify the conditions on binding of anaphors, personal pronouns and non-pronouns, some further definitions are necessary. The following definitions are definitions of *local o-command*, *o-command*, and *o-bind*. The terms are

<sup>12</sup>NP <sup>1</sup> is an abbreviation for a feature description of a nominal phrase with the index 1 . The feature description in (10) is also an abbreviation. Path information leading to CONT is omitted, since it is irrelevant for the present discussion.

20 Anaphoric binding

Figure 3: Tree configuration of examples for binding with ARG-ST lists

reminiscent of *c-command*, but the "o" in place of the "c" is intended to indicate the important role of the obliqueness hierarchy. The definitions are as follows:

(11) Let Y and Z be *synsem* objects with distinct LOCAL values, Y referential. Then Y *locally o-commands* Z just in case Y is less oblique than Z.

For some X to be less oblique than Y, it is required that X and Y are on the same ARG-ST list.


(11) says that an ARG-ST element locally o-commands any other ARG-ST element to the right of it. The condition of non-identity of the two elements under consideration in (11) and (12) is necessary to deal with cases of raising, in which one

### Stefan Müller

element may appear in various different ARG-ST lists. It is also needed to rule out unwanted command relations in the case of nonlocal dependencies, since the local value of a filler is shared with its gap. See Abeillé (2021), Chapter 12 of this volume for discussion of raising in HPSG and Borsley & Crysmann (2021), Chapter 13 of this volume on unbounded dependencies in HPSG. The condition that Y has to be referential excludes expletive pronouns like *it* in *it rains* from entering o-command relations. Such expletives are part of ARG-ST and valence lists, but they are entirely irrelevant for Binding Theory, which is the reason for their exclusion in the definition. Pollard & Sag (1994: 258) discuss the following examples going back to observations by Freidin & Harbert (1983: 65) and Kuno (1987: 95):

	- b. They made sure that it wouldn't bother each other to invite their respective friends to dinner.

According to Pollard & Sag (1994: Section 3.6), the *it* is an expletive. They assume that extrapositions with *it* are accounted for by a lexical rule that introduces an expletive and a *that* clause or an infinitival verb phrase into the valence list of the respective predicates (see also Abeillé & Borsley 2021: Section 4.2, Chapter 1 of this volume). Since the *it* is not referential, it is not a possible antecedent for the anaphors in sentences like (14), and hence a Binding Theory built on the definitions in (11) and (12) will make the right predictions.<sup>13</sup>

The definition of o-command uses the relations of *locally o-command* and *dominate*. With respect to Figure 3, one can say that NP o-commands all nodes below the CP node because NP locally o-commands the CP and the CP node dominates everything below it. So NP o-commands C, NP , VP, V and NP .

The definition of *o-bind* in (13) says that two elements have to be coindexed and there has to be a (local) o-command relation between them. The indices include person, number and gender information (in English), so that *Mary* can bind *herself* but not *themselves* or *himself*. With these definitions, the binding principles can now be stated as follows:

	- *Principle A* A locally o-commanded anaphor must be locally o-bound.
	- *Principle B* A personal pronoun must be locally o-free.
	- *Principle C* A non-pronoun must be o-free.

<sup>13</sup>But see the discussion of (33c) below.

Principle A accounts for the ungrammaticality of sentences like (16):

	- b. *likes*: ARG-ST NP , NP[*ana*]

Since both *Mary* and *himself* are members of the ARG-ST list of *likes*, there is an NP that locally o-commands *himself*. Therefore there should be a local o-binder. But since the indices are incompatible because of mismatching gender values, *Mary* cannot o-bind *himself*, making *himself* locally o-free and hence in conflict with Principle A.

Similarly, the binding in (17) is excluded, since *Mary* locally o-binds the pronoun *her* and hence Principle B is violated.

(17) a. Mary likes her∗ . b. *likes*: ARG-ST h NP , NP[*ppro*]∗ i

Finally, Principle C accounts for the ungrammaticality of (18):

	- b. *thinks*: ARG-ST h NP , CP i c. *likes*:

ARG-ST h NP, NP[*npro*]∗ i

Since *he* and *Peter* are coindexed and since *he* o-commands *Peter*, *he* also o-binds *Peter*. According to Principle C, this is forbidden, and hence bindings like the one in (18a) are ruled out.

# **2.1 Ditransitives**

For ditransitives, there are three elements on the ARG-ST list: the subject, the primary object and the secondary object. If the secondary object is a reflexive, Principle A requires this reflexive to be coindexed with either the primary object or the subject. Hence, the bindings in (19) are predicted to be possible and (20) is out, since neither *I* nor *you* is a possible binder of *herself* because of number mismatches:

	- b. John showed Mary himself . ARG-ST NP , NP , NP[*ana*]

### Stefan Müller

(20) \* I showed you herself . ARG-ST NP , NP , NP[*ana*] 

Note that configuration-based Binding Theories like the one entertained in GB and Minimalism require the primary object to c-command the secondary object but not vice versa. This results in theories that have to assume certain branchings and, in some cases, even auxiliary nodes (Adger 2003: Section 4.4). In HPSG, the branching that is assumed does not depend on binding facts, and, indeed, ternary branching VPs (Pollard & Sag 1994: 40) as well as binary branching ones have been assumed (see Müller 2021a: Section 3, Chapter 10 of this volume for discussion).

The list-based Binding Theory outlined above seems very simple. So far I have explained binding relations between coarguments of a head where the coarguments are NPs or pronouns. But there are also prepositional objects, which have an internal structure with the referential NPs embedded within a PP. Pollard & Sag (1994: 246, 255) discuss examples like (21):

	- b. Mary talked [to John] [about himself].

As noted by Bach & Partee (1980: 137, Section 6.5.6), Chomsky (1981: 226), and Pollard & Sag (1994: 246), examples like the second one are a problem for the GB Binding Theory, since *John* is inside the PP and does not c-command *himself*. See Figure 4. Examples involving case-marking prepositions are no problem for

Figure 4: Binding within prepositional objects poses a challenge for GB's Binding Theory

20 Anaphoric binding

HPSG, however, since it is assumed that the semantic content of prepositions is identified with the semantic content of the NP they select. Hence, the PP *to John* has the same referential index as the NP *John* and the PP *about himself* has the same index as *himself*. The ARG-ST list of *talked* is shown in (22):

(22) *talked*:

ARG-ST NP , PP[*to*] , PP[*about*, *ana*] 

The Binding Theory applies as it would apply to ditransitive verbs. Since the first PP is less oblique than the second one, it can bind an anaphor in the second one. The same is true for the example in (21a) and the lexical item for *depend* with the ARG-ST in (23):

(23) *depend*: ARG-ST h NP , PP[*on*, *ppro*]∗ i

Since the subject is less oblique than the PP object, it locally o-commands the PP, and even though the pronoun *him* is embedded in a PP and not a direct argument of the verb, the pronoun cannot be bound by *John*. An anaphor would be possible within the PP object, though. And of course the subject NP can bind NPs within both PP arguments of *talked*: both *to herself* and *about herself* would be possible as well.

# **2.2 Binding and nonlocal dependencies**

Examples like (24) are covered by HPSG's Binding Theory, since *himself* is fronted via HPSG's nonlocal mechanism (see Borsley & Crysmann 2021, Chapter 13 of this volume) and there is a connection between the fronted element and the missing object.

	- b. *admire*: ARG-ST h NP , NP[*gap*, *ana*] i

Therefore, the LOCAL value of *himself* is identified with the LOCAL value of the object in the ARG-ST list of *admires*, and since the object is local to the subject of admire, it has to be bound by the subject. But there is more to say about binding and nonlocal dependencies in HPSG. Pollard & Sag (1994: 265) point out an interesting consequence of the HPSG treatment of nonlocal dependencies: since nonlocal dependencies are introduced by traces that are lexical elements, rather than by deriving one structure from another as is common in Transformational Grammar, there is no way to reconstruct a phrase with all its internal structure

### Stefan Müller

into the position of the trace. Since traces do not have daughters, \_ in (25) has the same local properties (part of speech, case, referential index) as *which of Claire's friends*, without having its internal structure.

(25) I wonder [which of Claire's friends] [we should let her invite \_ to the party]?

Since extracted elements are not reconstructed into the position where they would be usually located, (25) is not related to (26):

(26) We should let her∗ invite which of Claire's friends to the party?

*Claire* would be o-bound by *her* in (26), violating Principle C, but since traces do not have daughters, no problem arises in (25).

Some of the more recent theories of nonlocal dependencies even do without traces (Bouma, Malouf & Sag 2001). These are discussed in more detail in Borsley & Crysmann (2021), Chapter 13 of this volume. For the treatment of binding data, it does not matter whether there is a trace or not: traceless accounts of extraction assume that members of the ARG-ST list, which contains all arguments, are not mapped onto the valence lists. So for the lexical item in (10), one would assume the two variants in (28) that play a role in the analysis of the sentences in (27):

(27) a. I like bagels.

b. Bagels, I like.


*gap* stands for a special type that is used to indicate that a certain argument is a gap rather than an overtly realized element. Gaps pass their nonlocal information up to the mother node, which is indicated by a slash in the figures in Figure 5. The traceless analysis does not differ from the trace-based approach as far as the makeup of the ARG-ST list is concerned. In a trace-based analysis, the trace is an argument of the verb. Thus, the description of the accusative object is identified with the description in the COMPS list, and this element is identical to the second element of the ARG-ST list. This means that we can talk about the same ARG-ST configurations for both types of theories and abstract away from the concrete realization of extraction. Pollard & Sag's (1994: 265) analysis of (25) works in both worlds: in the traceless analysis, there is no element that could

20 Anaphoric binding

Figure 5: Traceless and trace-based analyses of fronting

have daughters, and in the trace-based analysis, there is a trace, but since traces are simple lexical items in HPSG without internal structure (Pollard & Sag 1994: 164), there is nothing like the "reconstruction" known from GB.<sup>14</sup>

(i) a. [Karl 's friend] , he∗ knows \_ . b. Karls Karl's Freund friend kennt knows er∗ . he 'He knows Karl's friend.'

According to the definition of o-command, *he* locally o-commands the object of *knows*. This object is a gap. Therefore the local properties of *Karl's friend* are in relation to *he*, but since gaps/traces do not have daughters, there is no o-command relation between *he* and *Karl*, hence *Karl* is o-free and Principle C is not violated. Thus, there is no explanation for the impossibility of binding *Karl* to *he*. In order to fix this, the definition of dominance could be changed so that GB's notion of reconstruction would be mimicked (Müller 1999b: 409–410). According to Müller's definition, a trace or gap would "dominate" the daughters of its filler. While this would account for cases like (28), the account of (25) would be lost.

Steve Wechsler (p.c. 2021) pointed out that the totally non-configurational Binding Theory that is discussed in Section 3 also "reconstructs" the fronted element into the position of the gap. Filler and gap share LOCAL values, and since *which* is the head of *which of Claire's friends*, there is an o-command relation between *her* and *Claire*, and hence (25) should be ungrammatical.

Alternatively, one might not assume "reconstruction", instead explaining the effects by different means like pragmatics or processing, as was suggested in the references cited above.

<sup>14</sup>Müller (1999b: Section 20.2) discussed the examples in (i), which seem to be problematic for theories in which the internal structure of extracted material plays no role:

### Stefan Müller

# **2.3 Exempt anaphors**

The statement of Principle A has interesting consequences: if an anaphor is not locally o-commanded, Principle A does not say anything about requirements for binding. This means that anaphors that are initial in an ARG-ST list may be bound outside of their local environment. Example (5) from Pollard & Sag (1994: 270) – repeated here as (29) for convenience – shows that a reflexive can even be bound to an antecedent outside of the sentence:

(29) John was going to get even with Mary. That picture of himself in the paper would really annoy her, as would the other stunts he had planned.<sup>15</sup>

A further example are NPs within adverbial PPs. Since there is nothing in the PP *around himself* that is less oblique than the reflexive, the principles governing the distribution of reflexives do not apply and hence both a pronoun and an anaphor is possible:<sup>16</sup>

	- b. John wrapped a blanket around himself .

Which of the pronouns is used is said to depend on the *point of view* of the speaker (Kuroda 1973; for further discussion and a list of references, see Pollard & Sag 1994: 270).

The exemptness of anaphors seems to cause a problem, since the Binding Theory does not rule out sentences like (31):

(31) \* Himself sleeps.

This is not a real problem for languages like English, since such sentences are ruled out anyway; *sleeps* requires an NP in the nominative and *himself* is accusative (Brame 1977: 388; Pollard & Sag 1994: 262). But Müller (1999b: Section 20.4.6) pointed out that German has subjectless verbs like *frieren* 'be cold' and *dürsten* 'be thirsty' that govern an accusative:

(32) a. Den the.ACC Mann man friert. cold.is 'The man is cold.'

<sup>15</sup>Pollard & Sag (1994: 270)

<sup>16</sup>There are various conflicting judgements of examples like (30) in the literature. For an overview and an experiment confirming the judgement in (30), see Golde (1999: Chapter 3).

20 Anaphoric binding


However, as Kiss (2012: 158, 161) – discussing his own data and referring to Frey (1993: 131) – pointed out, anaphors are not exempt in German. So, examples like (32b) and (32d) are correctly ruled out by a general ban on unbound anaphors in German.

The contrast in (33) seems to be problematic. The analysis suggested by Pollard & Sag (1994: 149) assumes that an extraposition *it* is inserted into the ARG-ST list and the clause is appended to this list:

	- b. It bothers me that Sandy snores. *bother*: ARG-ST: h NP[*it*], NP[*ppro*], S i
	- c. \* It bothers myself that Sandy snores. *bother*: ARG-ST: h NP[*it*], NP[*ana*], S i

According to Pollard & Sag (1994: 149), the *it* in (33b–c) is non-referential. This would mean that there is nothing that o-commands the accusative object, making anaphors exempt in the object position, and hence sentences like (33c) would be predicted to be grammatical. However, they are not, which seems to argue for an analysis that treats the extraposition *it* as a referential element (Müller 1999b: 215, 232).

# **2.4 Inalienable possession NPs**

Koenig (1999) examines examples like (34) in which a definite noun phrase is interpreted as a body part of some other argument of the involved verb. Koenig discusses French data, but a parallel construction exists in German as well.<sup>18</sup>

<sup>17</sup>Fanselow (1986: 349)

<sup>18</sup>See also Sailer (2021: 796), Chapter 17 of this volume for discussion of body parts in the context of idioms.

Stefan Müller

(34) Marc Marc a has avancé advanced le the pied . foot 'Marc moved his foot forward.'

Koenig argues that these inalienable possession NPs should be interpreted by making recourse to the same mechanism as used in Binding Theory, rather than argument linking (see Davis, Koenig & Wechsler 2021, Chapter 9 of this volume on linking).<sup>19</sup> In addition to what Binding Theory predicts, he defines the concept of an Active Zone (Langacker 1984) in order to further restrict possible candidates for the possessor. He also formulates restrictions that have to hold on semantic roles filled by the possessor and the body part. Although an exploration of all of this would take us too far away from the topic at hand, I want to discuss Koenig's lexical rule for possessive nouns, which he assumes to be similar to what is given in (35):

(French)

(35) Lexical rule for body part nouns adapted from Koenig (1999: 256):

The lexical rule maps a body part noun selecting for a possessive NP ( 1 ) via its SPR feature onto a body part noun selecting for a definite article. Since by convention features not mentioned in the lexical rule are taken over from the input, the output has the same CONT value as the input. The output has the specification that the element in the ARG-ST is of type *refl* and *s-ana*. Pronominal elements of this type behave like reflexive pronouns and have to be bound in the subject domain.

<sup>19</sup>Steve Wechsler (p.c., 2021) pointed out that in the light of Schwarz's (2019) theory of weak definites, a reanalysis of the phenomena discussed in this section may be possible. I leave this for further research.

20 Anaphoric binding

 

(36) a. Body part noun with possessive pronoun: CAT HEAD *noun* SPR 1 COMPS hi ARG-ST <sup>1</sup> NP <sup>2</sup> CONT INDEX 3 RESTR *inal-poss-rel* POSSESSOR 2 POSSESSED 3 b. Body part noun with definite determiner: CAT HEAD *noun* SPR DET COMPS hi ARG-ST DET, NP[*refl* ∧ *s-ana*] <sup>2</sup> CONT INDEX 3 RESTR *inal-poss-rel* POSSESSOR 2 POSSESSED 3 

 The two lexical items can be used to analyze (37) and (34), respectively.

(37) Marc Marc a has avancé advanced son his pied. foot 'Marc moved his foot forward.'

While in (37) a possessive pronoun is selected by the body part noun in (36a), this is not the case in the analysis of (34). But in terms of binding, the situation is similar: in both sentences there is an initial element in the ARG-ST that is linked to the possessor role of the noun. The possessive pronoun has, of course, a pronominal index, and the NP in the ARG-ST in (36b) has a pronominal index as well, since this is what was specified in the lexical rule. So Koenig's approach can account for the data without assuming any additional structure or additional empty pronominal elements.

# **2.5 Long-distance reflexives**

A lot of work on binding in various frameworks deals with English and how to formulate the ABC of Binding Theory. However, work by Dalrymple (1993) shows convincingly that there is considerable crosslinguistic variation. Following Dalrymple, researchers working in HPSG have suggested various types of

### Stefan Müller

pronominal elements that have to be bound in various domains (Abeillé et al. 1998; Koenig 1999; Xue et al. 1994; Pollard & Xue 1998; Branco & Marrafa 1999; Hellan 2005). Those working on languages that have so-called long-distance reflexives like Mandarin Chinese, Portuguese and Norwegian (Xue, Pollard & Sag 1994; Pollard & Xue 1998; Branco & Marrafa 1999; Hellan 2005) have suggested a fourth binding principle.<sup>20</sup> In such languages, there are pronouns that must be bound, but they may be bound locally or non-locally. Such pronouns are called Zpronouns, and the binding principle responsible for them is Principle Z (Branco & Marrafa 1999: 171). Adding Principle Z to the preliminary version of HPSG's Binding Theory, we get:

(38) HPSG Binding Theory


Principle Z is like Principle A, but with the requirement that anaphors must be o-bound rather than locally o-bound. The requirement to be o-bound includes the option of being locally o-bound, but nonlocal o-binding is possible as well.

# **3 A totally non-configurational Binding Theory**

The initial definition of o-command contains the notion of dominance and hence makes reference to tree structures. Pollard & Sag (1994: 279) pointed out that the binding of *John* by *he* in (39a) is correctly ruled out because *he* o-commands the trace of *John*, and hence Principle C is violated. But since they follow GPSG in assuming that English has no subject traces (Pollard & Sag 1994: Chapter 4.4), this account would not work for (39b).

	- b. John∗ , he claimed left.

Later work in HPSG abolished traces altogether (Bouma, Malouf & Sag (2001); Borsley & Crysmann (2021), Chapter 13 of this volume, but see Müller & Machicao y Priemer (2019: Section 4.9) for a trace-based approach and Müller 2004; 2020: Chapter 19 on empty elements in general), and hence Binding Theory cannot rely on dominance any longer. This section deals with the revised version of Binding Theory that does not make reference to dominance. The revised non-

<sup>20</sup>For discussion of some of these languages and further examples from other languages and an analysis in LFG, see Dalrymple (1993).

20 Anaphoric binding

configurational variant of o-command suggested by Pollard & Sag (1994: 279) has the form in (40):<sup>21</sup>

	- i. Y is less oblique than Z; or
	- ii. Y o-commands some X that has Z on its ARG-ST list; or
	- iii. Y o-commands some X that is a projection of Z (i.e., the HEAD values of X and Z are token-identical).

The o-command relation can be explained with respect to Figure 6.

Figure 6: Tree for explanation of the o-command relation

According to the definition in (40), B o-commands E by the definition's clause i, since B and E are in the ARG-ST list of *thinks* and B is less oblique than E. B ocommands F, since it o-commands E and E is a projection of F (clause iii). B also o-commands G, since B o-commands F and F has G on its ARG-ST list (clause ii). Since B o-commands G, it also o-commands J, since G is a projection of J (clause iii). And because of all this, B also o-commands H and K, since B o-commands J and both H and K are members of the ARG-ST list of J (clause ii).

<sup>21</sup>I have replaced "subcategorized by" with reference to the ARG-ST list.

### Stefan Müller

This recursive definition of o-command is impressive in that it can account for binding phenomena in approaches that do not have empty nodes for traces in the tree structures. However, there are still open issues.22,23

As was pointed out by Hukari & Levine (1996: 490), Müller (1999b: Sect 20.4.1) and Walker (2011), adjuncts pose a challenge for the non-configurational Binding Theory. For example, a referential NP can be part of an adjunct, and since adjuncts are usually not part of ARG-ST lists, they would not be covered by the definition of o-command given above. *John* is part of the reduced relative clause modifying *woman* in (41).

(41) He∗ knows the woman loved by John .

Since the relative clause does not appear on any ARG-ST list, *he* does not ocommand *John*, and hence there is no Principle C violation and the binding should be fine, yet it is not.

Several authors suggested including adjuncts into ARG-ST lists of verbs (Chung 1998: 168; Przepiórkowski 1999: 240; Manning, Sag & Iida 1999: 60), but this would result in conflicts with Binding Theory if applied to the nominal domain (Müller 1999b: Section 20.4.1). The reason is that nominal modifiers have a semantic contribution that contains an index that is identical to the index of the modified noun.<sup>24</sup> If there are several such modifiers, we get a conflict, since we

(i) Sie∗ she kennt knows das the Kim Kim begeisternde enthusing Buch. book 'She knows the book enthusing Kim.'

<sup>22</sup>One was already mentioned in footnote 19.

<sup>23</sup>Note that the label *totally non-configurational Binding Theory* seems to suggest that dominance relations do not play a role at all, and hence this version of Binding Theory could be appropriate for HPSG flavors like Sign-Based Construction Grammar (SBCG) that do not have daughters in linguistic signs (see Sag 2012 and Müller 2021c: Section 1.3.2, Chapter 32 of this volume for discussion). But this is not the case. The definition of o-command in (40) contains the notion of projection. While this notion can be formalized with respect to a complex linguistic sign having daughters in Constructional HPSG, as assumed in this volume, this is impossible in SBCG, and one would have to refer to the derivation tree, which is something external to the linguistic signs licensed by a SBCG theory. See also footnote 8.

<sup>24</sup>See Arnold & Godard (2021: Section 2.2), Chapter 14 of this volume and Müller (1999a) on relative clauses. Sag (1997) suggests an approach to relative clauses in which a special schema is assumed that combines the modified noun with a verbal projection. This approach does not have the problem mentioned here. However, prenominal adjuncts would remain problematic as the following example (based on Müller 1999b: 412) shows:

The adjectival participle behaves like a normal adjectival modifier. For Principle C to make the right predictions, there should be a command relation between *sie* 'she' and the parts of the prenominal modifier. See also Arnold & Godard (2021: 631), Chapter 14 of this volume. PP adjuncts within nominal structures like *the house in the valley* are a further instance of problematic examples.

20 Anaphoric binding

have several coindexed non-pronominal indices on the same ARG-ST list, which would violate Principle C.

There are two possible solutions that come to mind. The first one is fairly ad hoc: one can assume two different features for different purposes. There could be the normal index for establishing coindexation between heads and adjuncts and heads and arguments, and there could be a further index for binding. Adjectives would then have a referential index for establishing coindexation with nouns and an additional index referring to a state, which would be irrelevant for the binding principles.

The second solution to the adjunct problem might be seen in defining o-command with respect to the DEPS list. The DEPS list is a list of dependents that is the concatenation of the ARG-ST list and a list of adjuncts that are introduced on this list (Bouma, Malouf & Sag 2001: 12). Binding would be specified with respect to ARG-ST and dominance with respect to DEPS (which includes everything on ARG-ST). The lexical introduction of adjuncts has been criticized because of scope issues by Levine & Hukari (2006: 153), but there are also problems related to binding. Hukari & Levine (1996: 490) pointed out that there are differences when it comes to the interpretation of pronouns in examples like (42a,b) and (42c,d):

	- b. They went into the city without the twins∗/ being noticed.
	- c. You can't say anything to them without the twins/ being offended.
	- d. You can't say anything about them without Terry criticizing the twins/ mercilessly.

While the subject pronoun cannot be coreferential with *the twins* inside the adjunct, the object pronoun in (42c,d) can. In relation to the discussion of examples like (42), Walker (2011: 233) noted that whether binding of the subject pronoun is possible also depends on the attachment position of the adjunct. While binding of a subject pronoun into a VP adjunct is impossible (43a), binding into a sentential adjunct is fine (43b).

	- b. They hadn't been on the road for half an hour [when the twins noticed that they had forgotten their money, passports and ID].

If we simply register adjuncts on the DEPS list, we are unable to refer to their position in the tree, and hence we cannot express any statement needed to cover the differences in (43). Note that this is crucially different for elements on the ARG-ST list in English, since the ARG-ST of a lexical item basically determines the

### Stefan Müller

trees it can appear in in English: the first element appears to the left of the verb as the subject, and all other elements appear to the right of the verb as complements. However, this is just an artifact of the rather strict syntactic system of English. It is not the case for languages with freer constituent order like German, which causes problems for Binding Theories that do not take the linearization of elements into account (see Grewendorf 1985: 140 and Riezler 1995: 12 for crucial examples).

There is another issue related to the totally non-configurational version of the Binding Theory: in 1994, HPSG was strictly head-driven. There were rather few schemata, and most of them were headed. Since then, more and more constructional schemata were suggested that do not necessarily have a head. For example, relative clauses were analyzed as involving an empty relativizer (Pollard & Sag 1994: Chapter 5; Arnold & Godard 2021: Section 2.2, Chapter 14 of this volume). One way to eliminate this empty element from grammars is to assume a headless schema that directly combines the relative phrase and the clause from which it is extracted (Müller 1999a: Section 2.7; Sag 2010: 522; Müller & Machicao y Priemer 2019: 345).<sup>25</sup> In addition, there were proposals to analyze free relative clauses such that the relative phrase is the head (Wright & Kathol 2003: 383). So, if *whoever* is the head of *whoever is loved by John*, the whole relative clause is not a projection of *loved*. Furthermore, *is loved by John* is not an argument of *whoever*, and hence there is no appropriate connection between the involved elements. This means that the arguments of *loved* will not be found by the definition of ocommand in (40). Consequently, *John* is not o-commanded by *he*, which predicts that the binding in (44) is possible, but it is not.

(44) He∗ knows whoever is loved by John .

Further examples of phenomena that are treated using unheaded constructions are serial verbs in Mandarin Chinese: Müller & Lipenkova (2009) argue that VPs are combined to form a new complex VP with a meaning determined by the combination. None of the combined VPs contributes a head. No VP selects for another VP.

There seems to be no way of accounting for such cases without the notion of dominance (but see Section 6 for a lexical solution). For those insisting on grammars without empty elements, the solution would be a fusion of the definition given in (40) with the initial definition involving dominance in (12). Hukari & Levine (1995) suggested such a fusion. This is their definition of vc-command:

<sup>25</sup>See Sag (1997) for another suggestion without empty relativizers.

20 Anaphoric binding

(45) v(alence-based) c-command:

Let α be an element on a valence list that is the value of the valence feature γ and α<sup>0</sup> the DTRS element whose SYNSEM value is structure-shared with α. Then if the constituent that would be formed by α<sup>0</sup> and one or more elements β has a null list as its value for γ, α vc-commands β and all its descendants.

Rewritten in more understandable prose, this definition means that if we have some constituent α<sup>0</sup> , then its counterpart in the valence list vc-commands all siblings of α<sup>0</sup> and their descendants, provided the valence list on which α<sup>0</sup> is selected is empty at the next higher node. We have two valence lists that are relevant in the verbal domain: SUBJ (some authors use SPR instead) and COMPS. The COMPS list is empty at the VP node and the SUBJ list is empty at the S node. So, the definition in (45) makes statements about two nodes in Figure 7: the lower VP node and the S node. For Figure 7, this entails that the object NP *the car* vccommands *bought*, since *the car* is an immediate daughter of the first projection with an empty COMPS list. The NP *they* vc-commands the VP *bought the car without anybody noticing the twins*, since both are immediately dominated by the node with the empty SUBJ list.

Figure 7: Example tree for vc-command: the subject vc-commands the adjunct because it is in the valence list of the upper-most VP and this VP dominates the adjunct PP

#### Stefan Müller

The proposal by Hukari & Levine was criticized by Walker (2011: 235), who argued that the modal component *would be formed* in the definition is not formalizable. Walker suggested the following revision:

	- i. γ 0 : [ SS|LOC|CAT|SUBJ h α i ] and γ<sup>0</sup> dominates β<sup>0</sup> , or
	- ii. α locally o-commands γ and γ<sup>0</sup> dominates β<sup>0</sup> .

Principle C is then revised as follows:

(47) Principle C: A non-pronominal must neither be bound under o-command nor under a vc-command relation.

Walker uses the tree in Figure 8 to explain her definition of vc-command. The second clause in the definition of vc-command is the same as before: it is based on local o-command and domination. What is new is the first clause. Because of this clause, the subject vc-commands the adjunct, since the subject 1 is in the SUBJ list of the top-most VP (α) and this top-most VP (γ<sup>0</sup> ) dominates the adjunct PP (β′).

Figure 8: Version of Figure 7 using Walker's labels α, β, and γ

Apart from the elimination of the modal component in the definition of vccommand, there is a further difference between Hukari & Levine's and Walker's definitions: the former applies to Specifier-Head structures, in which the singleton element of the SPR list is saturated. We will return to this in Section 6.1. Note also that the definition of Hukari & Levine includes the sibling VP among the

#### 20 Anaphoric binding

items commanded by the subject, while Walker's definition includes elements dominated by this VP only.<sup>26</sup> This difference will also matter in Section 6.1.

Hukari & Levine's examples involve a subject-object asymmetry. Interestingly, a similar subject-object asymmetry seems to exist in German, as Grewendorf (1985: 148) pointed out. The following example is based on his example:

	- b. \* In in Marias Maria's Wohnung flat erwartete waits sie she.NOM einen a.ACC Blumenstrauß. bouquet Intended: 'Maria waits for a bouquet in her flat.'

While the fronted adjunct can bind the object in (48a), binding the subject in (48b) is ruled out. Walker's proposals for English would not help in such examples, since in grammars of German, all arguments of finite verbs are represented in one valence list. Hence the highest domain in which vc-command is defined (taking Hukari & Levine's definition) is the full clause, since COMPS would be empty at this level. There is the additional problem that the adjunct is fronted in a nonlocal dependency (German is a V2 language; see Erdmann 1886: Chapter 2.4; Paul 1919: 69, 77; Müller 2015: Section 3) and that the arguments are scrambled in (48a). There is no VP node in the analysis of (48a) that is commonly assumed in HPSG grammars of German, and it is unclear how a reconstruction of the fronted adjunct into a certain position could help explain the differences in (48).

Concluding this section, it seems that a totally non-configurational Binding Theory seems to be impossible because of adjuncts, and the combination of configurational and non-configurational parts seems appropriate.

Section 6 discusses an alternative approach that collects indices in lists. This can be done in a way that gets the adjunct binding facts right.

# **4 Binding and passive: ARG-ST lists with internal structure**

Manning & Sag (1998) discuss binding in passive clauses. They suggest that the passive be analyzed as a lexical rule demoting the subject argument and adding an optional PP.<sup>27</sup>

<sup>26</sup>The situation is similar to the different versions of c-command in MGG. See footnote 7.

<sup>27</sup>See also Manning & Sag (1998: 114, 116), Müller (2003), Müller & Ørsnes (2013), and Blevins (2003: 512) for lexical rule-based analyses of the passive in English, German, Danish, Balto-Finnic and Balto-Slavic. Davis, Koenig & Wechsler (2021: Section 5.3), Chapter 9 of this volume give an overview.

### Stefan Müller

$$\begin{array}{ccccc} \text{(49)} & \text{Lexical rule for the passive in English:} \\ \text{[} & \text{'/} & \text{'/} & \text{'} \\ \end{array}$$

$$\left[ \text{ARG-ST} \mid \text{NP}\_l, \Box \dots \right] \mapsto \left[ \text{ARG-ST} \mid \left( \Box \dots \right) \left( \oplus \left< \text{PP}[by]\_l \right> \right) \right]$$

The lexical rule applies to a verb with at least two arguments: the NP and <sup>1</sup> . It licenses the lexical item for the participle. The ARG-ST list of the participle does not contain the subject NP any longer, but instead, a PP object that is coindexed with the same argument is appended to the list. The lexical rule does not show the CONT value in the input and the output. A notational convention regarding lexical rules is that values of features that are not mentioned are taken over unchanged from the input. For our example, this means that linking is not affected. The index of the initial element in the input was linked to a certain role and this index – now associated with the PP – is linked to the same semantic role in the output. The PP does not assign a role. It just functions like one of the prepositional objects discussed on page 900–901 above. The examples in (50) illustrate:

$$\begin{array}{ll} \text{(50)} & \text{a.} & \text{John disappeared himself.}\\ & & \text{ARG-ST } \{ \text{NP}\_{l}, \text{NP}\_{l} \} \\ & & \text{b.} & \text{John was dissipated by himself.} \\ & & \text{ARG-ST } \{ \text{NP}\_{j}, \text{PP}[by]\_{l} \} & \text{disappoint(i.j)} \end{array}$$

(50a) shows the linking to the arguments of the finite verb *disappoint*; the subject is linked to the first argument and the object to the second. In the passive case in (50b), the logical object is realized as the subject but still linked to the second argument of *disappoint*. The former subject, now realized as a PP, is linked to the first argument.

The passive example in (50b) would – if one would just put the reflexive in subject position – correspond to (51):

(51) \* Himself disappointed John. ARG-ST NP , NP disappoint(i,j)

Of course (51) is ungrammatical because of the case of the reflexive pronoun: it is accusative and hence cannot function as subject (Brame 1977: 388). But the example would also be bad for binding reasons: the reflexive cannot bind a more oblique argument. In any case, the discussion shows that a purely thematic theory of binding would not work, since the semantic representation in the examples above is the same. It is the obliqueness of arguments that differs and this difference makes different binding options available.

So, the lexical rule-based approach to the passive makes the right predictions as far as the English data is concerned, but Perlmutter (1984) argued that more complex representations are necessary to capture the fact that some languages

20 Anaphoric binding

allow binding to the logical subject of the passivized verb. He discusses examples from Russian. While usually the reflexive has to be bound by the subject as in (52a), the antecedent can be either the subject or the logical subject in passives like (52b):

	- b. Eta this kniga book.NOM byla was kuplena bought Borisom Boris.INSTR dlja for sebja . SELF 'This book was bought by Boris for himself.'

In order to capture the binding facts, Manning & Sag (1998) suggest that passives of verbs like *kupitch* 'buy' have the following representation, at least in Russian.

> 

(53) *kuplena* 'bought': ARG-ST NP[*nom*] , NP[*instr*] , PRO , PP CONT *buying* ACTOR *i* UNDERGOER *j* BENEFICIARY *k* 

The ARG-ST list is not a simple list like the list for English; rather, it is nested. The complete ARG-ST list of the lexeme *kupitch* 'buy' is contained in the ARG-ST list of the passive. The logical subject is realized in the instrumental, and the logical object is stated as PRO on the embedded ARG-ST but as full NP in the nominative on the top-most ARG-ST list. This setup makes it possible to account for the fact that a long-distance reflexive (see p. 908) like the reflexive in the PP may refer to one of the two subjects: the nominative NP in the upper ARG-ST list and the NP in the instrumental in the embedded list. The PRO element is kept as a reflex of the argument structure of the lexeme. Such PRO elements also play a role in binding phenomena in languages like Chi-Mwi:ni, also discussed by Manning & Sag.

In order to facilitate distributing the elements of such nested ARG-ST lists to valence features like SUBJ and COMPS, Manning & Sag (1998: 124, 140) use a complex relational constraint that basically flattens the nested ARG-STs again and removes all occurrences of PRO. An alternative would be to keep the ARG-ST list for linking, case assignment and scope and use additional lists related to the ARG-ST list for binding. Such lists can contain PRO indices and additional indices for complex coordinations (see Section 6.2). An approach assuming additional lists is discussed in Section 6.

### Stefan Müller

# **5 Austronesian: Disentangling ARG-ST and grammatical functions**

So far I have discussed binding for English with some occasional reference to Mandarin Chinese, Portuguese and German. The question is whether Binding Theory is universal, that is, whether it is a set of constraints holding for all languages, or whether language-specific solutions are necessary, maybe involving a general machinery for establishing such solutions. In this section, I explore approaches suggested for Austronesian languages.

Manning & Sag (1998) discuss data from Toba Batak, a Western Austronesian language. They assume that the ARG-ST elements are ordered with the actor first and the undergoer second, but since Toba Batak has two ways to realize arguments, the so-called *active voice* and the *objective voice*, either of the arguments can be the subject.

(54) a. Mang-ida AV-see si PM Ria Ria si PM Torus Torus 'Torus sees/saw Ria.'

(Toba Batak)

b. Di-ida OV-see si PM Torus Torus si PM Ria Ria 'Torus sees/saw Ria.'

Manning & Sag argue that the verb and the adjacent NP form a VP which is combined with the final NP to yield a full clause. They further argue that neither sentence in (54) is a passive or anti-passive variant of the other. Instead, they suggest that the two variants are simply due to different mappings from argument structure (ARG-ST) to surface valence (SUBJ and COMPS). They provide the following lexical items:

(55) a. *mang-ida* 'AV-see': b. *di-ida* 'OV-see': PHON *mang-ida* SUBJ h 1 i COMPS h 2 i ARG-ST <sup>1</sup> NP , <sup>2</sup> NP CONT *seeing* ACTOR *i* UNDERGOER *j* PHON h *di-ida* i SUBJ h 2 i COMPS h 1 i ARG-ST <sup>1</sup> NP , <sup>2</sup> NP CONT *seeing* ACTOR *i* UNDERGOER *j* 

The order of the elements in the ARG-ST list corresponds to the grammatical functions as realized in the active voice. The analysis of (54b) is given in Figure 9.

#### 20 Anaphoric binding

Since the second argument, the logical object and undergoer, is mapped to SUBJ in (55b), it is combined with the verb last.

 

Figure 9: Analysis of the Toba Batak example in objective voice according to Manning & Sag (1998: 120)

But since binding is taken care of at the ARG-ST list and this list is not affected by voice differences, this account correctly predicts that the binding patterns do not change, regardless of how the arguments are realized. As the following examples show, it is always the logical subject, the actor (the initial element on the ARG-ST list), that binds the non-initial one.

(56) a. [Mang-ida AV-saw diri-na] self-his si PM John . John (Toba Batak) 'John saw himself.'

#### Stefan Müller

	- b. [Di-ida OV-saw si PM John] John diri-na self-his 'John saw himself.'

Manning & Sag (1998: 121) point out that theories relying on tree configurations will have to assume rather complex tree structures for one of the patterns in order to establish the required c-command relations. This is unnecessary for ARG-ST-based Binding Theories.

Wechsler & Arka (1998) discuss similar data from Balinese and provide a parallel analysis. This analysis is also discussed in Davis, Koenig & Wechsler (2021: Section 3.3), Chapter 9 of this volume. The analysis is similar to what was just shown for Toba Batak: the elements on the ARG-ST list of simplex predicates are ordered according to the thematic hierarchy as suggested by Jackendoff (1972). But there is an important additional aspect that was already discussed in Section 1 with respect to English: ARG-ST-based theories work for raising examples as well. So even though raised elements do not get a semantic role from the head they are raised to, they can be bound by arguments of this head. Wechsler (1999: 189–190) illustrates this with the following examples:

	- b. Awak cange myself / \*cang me kaden OV.think cang 1sg suba already mati. dead 'I believed myself/\*me to be dead already.'
	- c. ARG-ST of 'AV/OV.think': h NP , <sup>1</sup> NP:*ana*/*ppro*∗ , XP[SUBJ h 1 i ] i
	- d. 'dead': SUBJ h 1 i ARG-ST h 1 NP i

Even though *awak cange* 'myself' is the subject of *mati* 'dead' and raised to the second position of the ARG-ST of 'think' both in the agentive and the objective voice ( 1 ), this element has to be a reflexive rather than a pronoun, as predicted by an ARG-ST-based theory. As the examples in (58a,b) show, this is independent of the realization in agentive or objective voice.

When a two-place verb is embedded under a raising predicate, the downstairs verb may be realized in objective voice. The raised element will be the object of the embedded verb. As the examples in (59) show, the raised object can be an anaphor, but not a full pronoun, bound by the subject of the raising verb. This is independent of the realization of the raising verb in agentive or objective voice:

	- b. Awakne self / Ia∗ 3rd tawang=a OV.know=3 lakar FUT tangkep OV.arrest polisi. police 'He knew that the police would arrest self/him∗ .'
	- c. ARG-ST of 'AV/OV.know': h NP , <sup>1</sup> NP:*ana*/*ppro*∗ , VP[SUBJ h 1 i ] i
	- d. ' OV.arrest': SUBJ h 1 i ARG-ST h NP, 1 NP i

(60) and (61) show the case in which the ARG-ST subject of 'think' is first person singular and the raised ARG-ST object is a third person singular pronoun. The object of 'see' has to be an anaphor rather than a personal pronoun, since it is local to the subject of 'see'. This is independent of the realization of the first two ARG-ST elements as subject or object of 'think'. (60) is the case in which the embedded verb is in agentive voice and hence the subject of 'see' is raised, and in (61), 'see' is in objective voice and hence the object of 'see' is raised.

	- b. Ia 3rd kaden OV.think cang 1sg suba already ningalin AV.see awakne self / ia∗ . 3rd 'I believe him to have seen himself.'
	- c. ARG-ST of 'AV/OV.think': h NP, <sup>1</sup> NP , XP[SUBJ h 1 i ] i

d. AV .'see': SUBJ h 1 i ARG-ST <sup>1</sup> NP, NP:*ana*/*ppro*∗ 

#### Stefan Müller

	- b. Awakne self kaden OV.think cang 1sg suba already tingalin=a . OV.see=3 'I believe him to have seen himself.'
	- c. ARG-ST of 'AV/OV.think': h NP, <sup>1</sup> NP , XP[SUBJ h 1 i ] i
	- d. OV .'see': SUBJ h 1 i ARG-ST NP , <sup>1</sup> NP:*ana*/*ppro*∗

As predicted by an ARG-ST-based Binding Theory, the bindings are independent of the realization in agentive or objective voice.

(Balinese)

As many researchers have pointed out (van Noord & Bouma 1997: Section 5; Müller 1999b: Section 20.4.2), there is some slight imprecision when it comes to the scope of the binding principles. Principle A says that a locally o-commanded anaphor must be locally o-bound. But in raising constructions, there may be several lists on which an anaphor is locally o-commanded. Wechsler (1999) resolves this imprecision and assumes an existential version of Principle A, according to which a locally o-commanded anaphor has to be locally o-bound on *some* ARG-ST. In the example in (61), the respective ARG-ST list is the one of 'see'. In contrast, a universal interpretation is assumed for Principle B: a pronominal must be locally o-free in all ARG-ST lists in which it appears.

Wechsler (1999) compares GB analyses with ARG-ST-based HPSG analyses and shows that the GB analysis, which may seem to be parallel to the HPSG analysis, does not extend to the Balinese facts but results in an insoluble contradiction. In contrast, the lexical, ARG-ST-based HPSG Binding Theory together with a mapping from ARG-ST to grammatical functions gets the facts right without any further stipulations.

# **6 Explicit constructions of lists with possible antecedents**

It was mentioned on p. 893 that HPSG sees binding as crucially different from nonlocal dependencies, while in GB the relation between a trace and its filler was seen as similar to pronoun binding. This section explains how the general mechanism for nonlocal dependencies (see Borsley & Crysmann 2021, Chapter 13 of this volume) can be used to account for binding data and in which way this solves or avoids problems of earlier approaches based on o-command. The idea to use the nonlocal mechanism was first suggested by Bredenkamp (1996: Section 7.2.3).

#### 20 Anaphoric binding

He did not work out his proposal in detail (see p. 104–105). He used the SLASH feature for percolation of binding information, which probably would result in conflicts with true nonlocal dependencies. Hellan (2005) developed an account using special nonlocal features for binding information. Both Bredenkamp and Hellan assume that the binding information is bound off in certain structures, as is common in the treatment of nonlocal dependencies in HPSG. In what follows, I look into Branco's (2002) account. Branco also uses the nonlocal machinery of HPSG but in a novel way, without something like a filler-head schema. Before looking into the details, I want to discuss two phenomena that have not been accounted for so far and that are problematic for a Binding Theory based on ocommand: first, there is nothing that rules out nominal heads as binders, and second, there are problems with coordinations. Both problems can be solved if there is a bit more control of which indices are involved in binding relations in which local environment.

# **6.1 Nominal heads as binders**

Pollard & Sag's (1994) definition of o-command has an interesting consequence: it does not say anything about possible binding relations between heads and their dependents. What is regulated is the binding relations between co-arguments and referential objects dominated by a more oblique coargument. As Müller (1999b: 419) pointed out, bindings like the one in (62) are not ruled out by the Binding Theory of Pollard & Sag (1994: Chapter 6):

# (62) his∗ father

The possessive pronoun is selected via SPR and hence a dependent of *father* (Müller 2021b; Machicao y Priemer & Müller 2021; Wechsler 2021: 230, Chapter 6 of this volume), but the noun does not appear in any ARG-ST list (assuming an NP analysis, see also Van Eynde 2021, Chapter 8 of this volume for discussion). The consequence is that Principles B and C do not apply, and the o-command-based Binding Theory simply does not have anything to say about (62). This problem can be fixed by assuming Hukari & Levine's (1995) version of Principle C together with their definition of vc-command in (45). This would also cover cases like (63):<sup>28</sup>

<sup>28</sup>Giuseppe Varaschin pointed out to me that many *i*-within-*i* violations may be due to semantic/pragmatic constraints. So *his father* would be a person X such that X is a father of X. Since 'father' is an irreflexive predicate, the binding would clash with our expectations. Culicover (1997: 71) discusses the following example:

<sup>(</sup>i) One finds [many books about themselves ] on Borges's literary output.

So maybe large parts of the explanation of *i*-within-*i* effects can be found in semantics/pragmatics.

### Stefan Müller

(63) his∗ father of John

What is not accounted for so far is Fanselow's (1986: 344) examples in (64):

	- b. der the Besitzer owner seines∗ of.his Bootes boat

These examples would be covered by an *i*-within-*i*-Condition as suggested by Chomsky (1981: 212). Chomsky's condition basically rules out configurations like the one in (65):

(65) ( … x … )

Pollard & Sag (1994: 244) consider the *i*-within-*i*-Condition in their discussion of GB's Binding Theory but do not assume anything like this in their papers. Nor was anything of this kind adopted anywhere else in the discussion of binding. Having such a constraint could be a good solution, but as Fanselow (1986: 343) working in GB pointed out, such a condition would also rule out cases like his examples in (66):

	- b. die the einander each.other verachtenden despising Männer men 'the men who despise each other'

German allows for complex prenominal adjectival phrases. The subject of the respective adjectives or adjectival participles are coindexed with the noun that is modified. Since the reflexive and reciprocal in (66) are coindexed with the nonexpressed subject, and since this subject is coindexed with the modified noun (Müller 2002: Section 3.2.7), a general *i*-within-*i*-Condition cannot be formulated for HPSG grammars of German. The problem also applies to English, although English does not have complex prenominal adjectival modifiers. Relative clauses basically produce a similar configuration:

	- b. That woman listening to her own voice on the radio is Barbra Streisand.<sup>29</sup>

<sup>29</sup>Varaschin (2021: 50)

20 Anaphoric binding

The non-expressed subject in (67a) is the antecedent for *herself*, and since this element is coindexed with the antecedent noun of the relative clause, we have a parallel situation. Similarly, the subject of *listening* is the antecedent of *her*.

Chomsky (1981: 229, Fn. 63) notes that his formulation of the *i*-within-*i*-Condition rules out relative clauses and suggests a revision. However, the revised version would not rule out the examples in (62)–(64) above either, so it does not seem to be of much help.

In a version of the Binding Theory that is based on command relations in tree configurations, some special constraint seems to be needed that rules out binding by and to the head of nominal constructions unless this binding is established by adnominal modifiers directly. The approach to binding discussed below accounts for *i*-within-*i* problems by explicitly collecting indices that are possible antecedents and excluding the unwanted indices in this collection. But before we look into the details, I want to discuss another area that is problematic for tree-configurational approaches in general, not just for the HPSG approach based on o-command.

# **6.2 Binding and coordination: Questions of locality**

Müller (1999b: Section 20.4.7) pointed out that examples like (68) involving anaphors within coordinations are problematic for the HPSG Binding Theory:

(68) Wir we beschreiben describe ihm him [sich SELF und and seine his Familie]. family 'We describe him and his family to him.'

Since *sich* 'SELF' is not local to *ihm* 'him' and since reflexives are not exempt in German (Kiss 2012: 158–159), *ihn* 'him' would be expected as the only option for a pronominal element within the coordination.

Fanselow (1987: 112) discussed such examples in the context of a GB-style Binding Theory. See also Müller (1999b: 420) for attested examples. Such sentences pose a challenge for the way locality is defined as part of the definition of local ocommand. Local o-command requires that the commander and the commanded phrase are members of the same ARG-ST list (11), but the result of coordinating two NPs is usually a complex NP with a plural index:

(69) Der the Mann man und and die the Frau woman kennen know / \* kennt knows das the Kind. child 'The man and the woman know the child.'

#### Stefan Müller

The NP *der Mann und die Frau* 'the man and the woman' is an argument of *kennen* 'to know'. The index of *der Mann und die Frau* 'the man and the woman' is local with respect to *das Kind* 'the child'. The indices of *der Mann* 'the man' and *die Frau* 'the woman' are embedded in the complex NP.

For the same reason, *sich* is not local to *ihm* in (68). This means that the anaphor is not locally o-commanded in any of the sentences, and hence Binding Theory does not say anything about the binding of the reflexive in these sentences: the anaphors are exempt.

For the same reason, *ihn* 'him' is not local to *er* 'he' in (70b), and hence the binding of *ihn* 'him' to *er* 'he', which should be excluded by Principle B, is not ruled out.<sup>30</sup>

	- b. Er he sorgt cares nur only für for [ihn∗ him und and seine his Familie]. family

Reinhart & Reuland (1993) develop a Binding Theory that works at the level of syntactic or semantic predicates. Discussing the examples in (71), they argue that the semantic representation is (72) and hence their semantic restrictions on reflexive predicates apply.

(71) a. The queen invited both Max and herself to our party.

b. \* The queen<sup>1</sup> invited both Max and her<sup>1</sup> to our party.

(72) the queen ( x (x invited Max & x invited x))

Such an approach solves the problem for coordinations with *both … and …* having a distributive reading. Reinhart & Reuland (1993: 677) explicitly discuss coordinations with a collective reading. Since we have a collective reading in examples

(i) Er he sorgt cares nur only für for sich SELF und and er he sorgt cares nur only für for seine his Familie. family

<sup>30</sup>If one assumed transformational theories of coordination deriving (69) from (i) below (see for example Wexler & Culicover 1980: 303 and Kayne 1994: 61, 67 for proposals to derive verb coordination from VP coordination plus deletion), the problem would be solved. However, as has been pointed out frequently in the literature, such transformation-based theories of coordinations have many problems (Bartsch & Vennemann 1972: 102; Jackendoff 1977: 192– 193; Dowty 1979: 143; den Besten 1983: 104–105; Klein 1985; Eisenberg 1994; Borsley 2005: 471), and nobody has ever assumed something parallel in HPSG (see Abeillé & Chaves (2021), Chapter 16 of this volume on coordination in HPSG).

20 Anaphoric binding

like (70), examples like (70) continue to pose a problem. There are, however, ways to cope with such data: one is to assume a construction-based account to binding domains. The details of an account that makes this possible will be discussed in the following subsection.

# **6.3 The list-threading approach to binding**

The discussion of early HPSG approaches to binding revealed a number of problems. The proposals are based on tree configurations and on command relations. This is basically the conceptual inheritance of the GB Binding Theory, of course with a lot of improvements. The general problem seems to be that the command relations are defined in a uniform way, without taking into account special configurations such as coordinate structures.

Now, there is a more recent approach to binding that looks technical at first, but it is the solution to the problems caused by an approach that assumes one command relation that is supposed to work for all structures in all languages. Branco (2002) suggested an approach that collects indices that are available for binding in certain binding domains.<sup>31</sup> The ways in which these indices are collected can be specified with reference to particular constructions, allowing the problems mentioned so far to be circumvented.

Branco (2002) argues that sentences with wrong bindings of pronouns and/or reflexives are not syntactically ill-formed, but rather semantically deviant. For the representation of his Binding Theory, he assumes Underspecified Discourse Representation Theory (UDRT; Reyle 1993; Frank & Reyle 1995) as the underlying formalism for semantics (see also Koenig & Richter 2021: Section 6, Chapter 22 of this volume).

Similar to the notions assumed in Minimal Recursion Semantics (MRS; Copestake, Flickinger, Pollard & Sag 2005; see also Koenig & Richter 2021: Section 6.1, Chapter 22 of this volume for an introduction to MRS), there is an attribute for distinguished labels that indicate the upper (L-MAX) and lower (L-MIN) bounds for quantifier scope, and there is a set of subordination conditions for quantifier scope (the HCONS set in MRS), as well as a list of semantic conditions (the RELS set in MRS). In addition, Branco suggests a feature ANAPH(ORA) for handling the Binding Theory constraints. Information about the anaphoric potential of nominals is represented there. There is a reference marker represented under R(EFERENCE)-MARK(ER), and there is a list of reference markers under AN-TEC(EDENTS). The list is set up in a way so that it contains the antecedent can-

<sup>31</sup>For a much more detailed overview of Branco's approach, see Branco (2021).

### Stefan Müller

didates of a nominal element. Furthermore, Branco adds special lists containing antecedents for special types of anaphora. The lists are named after the binding principles that were already discussed in previous sections: LIST-A contains all reference markers of elements that locally o-command a certain nominal expression *n* ordered with respect to their obliqueness, and LIST-Z contains all ocommanders, also including everything from LIST-A. The elements in LIST-Z may come from various embedded clauses and are also ordered with respect to their obliqueness. The list LIST-U contains all the reference markers in the discourse context including those not linguistically introduced. The list LIST-LU is an auxiliary list that will be explained below.

$$\begin{array}{c|c|c} \hline \\ & \text{LOC} \text{[CONT} & \text{[L-MAX]} & \text{[L]} \\ \text{LOC} & \text{SSBOR} & \text{\{\dots\}} & \\ & \text{CONDS} & \{\dots\} & \\ & & \text{CONDS} & \{\dots\} \\ & & \text{ANAPH} & \begin{bmatrix} \text{R-MARK} & \text{reffm} \\ \text{ANTEC} & \text{list(reffm)} \end{bmatrix} \\ & \text{NONLOC} & \begin{bmatrix} \text{bind} & \\ \text{List-A} & \text{list(reffm)} \\ \text{List-Z} & \text{list(reffm)} \\ \text{List-U} & \text{list(reffm)} \\ \text{List-L'} & \text{list(reffm)} \\ \text{List-L'} & \text{list(reffm)} \\ \end{bmatrix} \\ \hline \end{array}$$

 The lists containing possible antecedents for various nominal elements are represented under NONLOCAL as the value of a newly introduced feature BIND. These binding lists differ from other NONLOCAL features in that nothing is ever removed from them (for unbounded dependencies and NONLOCAL features in general, see Borsley & Crysmann (2021), Chapter 13 of this volume). Before I provide the principles that determine the list values, I will explain them on an example. Figure 10 shows the relevant aspects of the analysis of (74):

(74) Every student thought that she saw herself.

The noun phrase *every student* introduces the reference marker (R-MARK) 3 for e-type anaphora (Evans 1980) and, as the value of VAR, the value used for boundvariable anaphora interpretations (Reinhart 1983). This is 2 in the example. The pronouns *she* and *herself* introduce the reference markers 4 and 5 respectively. All these reference markers are added to the bookkeeping list LIST-LU of the respective lexical items: *she* has 4 in its LIST-LU, and *herself* has 5 in this list. The

### Stefan Müller

noun phrase *every student* has both the variable 2 and the reference marker ( 3 ) in the LIST-LU. As can be seen by looking at the individual nodes in Figure 10, the elements of LIST-LU in daughters are collected at the mother node. The element *ctx* is an empty element that stands for the non-linguistic context. It is combined with one or more sentences to form a text fragment (see also Lücking, Ginzburg & Cooper (2021), Chapter 26 of this volume for discourse models and HPSG). The CONDS list of the *ctx* element contains semantic relations that hold of the world, and all reference markers contained in these relations are also added to the LIST-LU list. In the example, this is just 1 . The example shows just one sentence that is combined with the empty head, but in principle there can be arbitrarily many sentences. The LIST-LU list at the top node contains all reference markers contained in all sentences and the non-linguistic context.

The top node of Figure 10 is licensed by a schema that also identifies the LIST-U value with the LIST-LU value. The LIST-U value is shared between mothers and their daughters, and since LIST-LU is a collection of all referential markers in the tree and this collection is shared with LIST-U at the top node, it is ensured that all nodes have a LIST-U value that contains all reference markers available in the whole discourse. In our example, all LIST-U values are h 1 , 2 , 3 , 4 , 5 i.

LIST-A values are determined with respect to the argument structures of governing heads. So the LIST-A value of *thought* is h 2 , 3 i, and the one of *saw* is h 4 , 5 i. The LIST-A values of NP or PP arguments are identical to the ones of the head, hence *she* and *herself* have the same LIST-A value as *saw*, and *every student* has the same LIST-A value as *thought*. Apart from this, the LIST-A value is projected along the head path in non-nominal and non-prepositional projections. For further cases, see Branco (2002: 77).

The value of LIST-Z is determined as follows (Branco 2002: 77): for all sentences combined with the context element, the LIST-Z value is identified with the LIST-A value. Therefore, the LIST-Z value of *every student thought that she saw herself* is h 2 , 3 i: the LIST-A value is projected from *thought* and then identified with the LIST-Z value. In sentential daughters that are not at the top-level, the LIST-Z value is the concatenation of the LIST-Z value of the mother and the LIST-A value of the sentential daughter. In other non-filler daughters of a sign, the LIST-Z value is structure shared with the LIST-Z value of the sign. For example, *she* and *saw* and *herself* have the same LIST-Z value, namely h 2 , 3 , 4 , 5 i.

Branco (2002: 78) provides the lexical item in (75) for a pronoun. The interesting thing about the analysis is that all information that is needed to determine possible binders of the pronoun are available in the lexical item of the pronoun. The relational constraint principleB takes as input the LIST-A list 3 , the LIST-U

#### 20 Anaphoric binding

 

list 4 and the reference marker of the pronoun under consideration ( 2 ). The result of the application of principleB is the list of reference markers that does not contain elements locally o-commanding the pronoun, since all o-commanders of the reference marker 2 , which are contained in the LIST-A, are removed from LIST-U (the list of all reference markers in the complete discourse). In the case of *she* in our example, principleB returns the complete discourse h 1 , 2 , 3 , 4 , 5 i minus all reference markers of elements less oblique than 4 , which is the empty list (since 4 is the first element of h 4 , 5 i in Figure 10), minus 4 , since the pronoun is not a possible antecedent of itself. So, the list of possible antecedents of *she* is h 1 , 2 , 3 , 5 i. This list contains 5 as a possible binder, which is of course unwanted. According to Branco (2002: 84), *herself* as a binder of *she* is ruled out, since *she* binds *herself*.

The SYNSEM value for *herself* is shown in (76). LIST-A contains the reference markers of locally o-commanding phrases ( 3 ). Together with the reference marker of *herself* ( 2 ), 3 is the input to the relational constraint principleA. This constraint returns a list containing all possible binders for 2 , that is, all elements of 3 that are less oblique than 2 . If there is no such element, the returned list is the empty list and the anaphor is exempt (see Section 2.3).

The example discussed here involves a personal pronoun and a reflexive. The antecedents were determined by the relational constraints principleB and principleA. Further relational constraints are assumed for long-distance reflexives (principleZ) and normal referential NPs (principleC). principleC is part of the description of the specifier used in non-lexical anaphoric nominals (Branco 2002: 79).

Stefan Müller

(76) Parts of the SYNSEM value for *herself* :

The setting-up of the LIST-A and LIST-U lists is flexible enough to take care of problems that are unsolvable in the standard HPSG approach (and in GB approaches). For example, the LIST-U list of a noun phrase can be set up in such a way that the reference marker of the whole NP, which is introduced by the specifier, is not contained in the LIST-U list of the N that is combined with it. As pointed out by Branco (2002: 76), this solves *i-within-i* puzzles, which were discussed in Section 6.1.

Note also that this flexibility in determining the lists of possible local antecedents on a construction specific basis makes it possible for the first time to account for puzzling data like the coordination data discussed in Section 6.2. If the coordination analysis standardly assumed in HPSG (see Abeillé & Chaves 2021, Chapter 16 of this volume) is on the right track, a special rule for licensing coordination is needed, and this rule can also incorporate the proper specification of binding domains with respect to coordination.

Summing up, it can be said that the lexical, list-based solution discussed in this last section provides flexibility in defining binding domains and can cope with the *i*-within-*i* problem and problems of locality.

# **7 Conclusion**

I have discussed several approaches to Binding Theory in HPSG. It was shown that the valence-based approach that refers to the ARG-ST list of lexical items has advantages over proposals that exclusively refer to tree configurations. Since tree configurations play a minor role in HPSG's Binding Theory, binding data does

20 Anaphoric binding

not force syntacticians to assume structures branching in a certain way. This sets HPSG apart from theories like Government & Binding and Minimalism, in which empty nodes are assumed for sentences with ditransitive verbs in order to account for binding facts (Borsley & Müller 2021: 1284–1286, Chapter 28 of this volume).

A further highlight is the treatment of so-called exempt anaphors, that is, anaphors that are not commanded by a possible antecedent. Pollard & Sag (1992) argued that these anaphors should not be regarded as constrained by the Binding Theory and hence that binding by antecedents outside of the clause or the projection are possible.

Finally, a lexical approach to binding that makes all the relevant binding information available locally within lexical items of pronouns/reflexives/reciprocals was discussed. This approach is flexible enough to deal with problematic aspects like the *i*-within-*i* situations and locality problems in coordinated structures.

# **Abbreviations**


# **Acknowledgments**

I thank Anne Abeillé, Bob Borsley, Jean-Pierre Koenig, Giuseppe Varaschin and Steve Wechsler for detailed comments on earlier versions of the paper and Bob Levine and Giuseppe Varaschin for discussion. I thank Elizabeth Pankratz for proofreading.

# **References**

Abeillé, Anne. 2021. Control and Raising. In Stefan Müller, Anne Abeillé, Robert D. Borsley & Jean- Pierre Koenig (eds.), *Head-Driven Phrase Structure Grammar: The handbook* (Empirically Oriented Theoretical Morphology and Syntax), 489–535. Berlin: Language Science Press. DOI: 10.5281/zenodo.5599840.

### Stefan Müller


20 Anaphoric binding


#### Stefan Müller


20 Anaphoric binding


Evans, Gareth. 1980. Pronouns. *Linguistic Inquiry* 11(2). 337–362.


#### Stefan Müller


20 Anaphoric binding


### Stefan Müller


20 Anaphoric binding

ogy and Syntax), 1497–1553. Berlin: Language Science Press. DOI: 10 . 5281 / zenodo.5599882.


### Stefan Müller


#### 20 Anaphoric binding

Syntax), 275–313. Berlin: Language Science Press. DOI: 10 . 5281 / zenodo . 5599832.


#### Stefan Müller


# **Part III**

# **Other levels of description**

# **Chapter 21**

# **Morphology**

# Berthold Crysmann

Centre national de la recherche scientifique (CNRS)

This chapter provides an overview of work on morphology within HPSG. Following a brief discussion how morphology relates to the issue of lexical redundancy, and in particular horizontal redundancy, I map out the historical transition from meta-level lexical rules of derivational morphology and grammatical function change towards theories that are more tighly integrated with the hierarchical lexicon (Riehemann 1998; Koenig 1999). After a discussion of fundamental issues of inflectional morphology and the kind of models these favour, the chapter summarises previous HPSG approaches to the issue and finally provides an introduction to Information-based Morphology (Crysmann & Bonami 2016), a realisational model of morphology that systematically exploits HPSG-style underspecification in terms of multiple inheritance hierarchies.

# **1 Introduction**

Lexicalist approaches to grammar, such as HPSG, typically combine a fairly general syntactic component with a rich and articulate lexicon. While this makes for a highly principled syntactic component – e.g. the grammar fragment of English presented in Pollard & Sag (1994) contains only a handful of principles together with six rather general phrase structure schemata –, this decision places quite a burden on the lexicon, an issue known as lexical redundancy.

Lexical redundancy comes in essentially two varieties: vertical redundancy and horizontal redundancy. Vertical redundancy arises because many lexical entries share a great number of syntactic and semantic properties: e.g. in English (and many other languages) there is a huge class of strictly transitive verbs which display the same valency specifications, the same semantic roles, and the same

Berthold Crysmann. 2021. Morphology. In Stefan Müller, Anne Abeillé, Robert D. Borsley & Jean- Pierre Koenig (eds.), *Head-Driven Phrase Structure Grammar: The handbook*, 947–999. Berlin: Language Science Press. DOI: 10. 5281/zenodo.5599860

### Berthold Crysmann

linking patterns. From its outset, HPSG successfully eliminates vertical redundancy by means of multiple inheritance networks over typed feature structures (Flickinger et al. 1985).

The problem of horizontal redundancy is associated with systematic alternations in the lexicon: these include argument-structure alternations, such as resultatives or the causative-inchoative alternation, as well as classical instances of grammatical function change, such as passives, applicatives or causatives. The crucial difference with respect to vertical redundancy is that we are not confronted with what is essentially a classificational problem – assigning lexical items to a more general class and inheriting its properties –, but rather with a relation between lexical items. Morphological processes, both in word formation and inflection, crucially involve this latter type of redundancy: for example, in the case of deverbal adjectives in *-able*, we find a substantial number of derivations that show systematic changes in form, paired with equally systematic changes in grammatical category, meaning, and valency (Riehemann 1998). In inflection, change in morphosyntactic properties, e.g. case or agreement marking, is often signalled by a change in shape, which means the generalisation to be captured is about the contrast of form and morphosyntactic properties between fully inflected words.

Following Bresnan (1982b), the classical way to attack the issue of horizontal redundancy in HPSG is by means of lexical rules (Flickinger 1987). Early HPSG embraced Bresnan's original conception of lexical rules as mappings between lexical items. To a considerable extent<sup>1</sup> , work on morphology and, in particular, derivational morphology has led to a reconceptualisation of lexical rules within HPSG: now, they are understood as partial descriptions of lexical items that are fully integrated into the hierarchical lexicon (Koenig 1999). As such, they are amenable to the same underspecification techniques that are used to generalise across classes of basic lexical items.

The chapter is structured as follows: in Section 2, I shall present the main developments towards an inheritance-based view of derivational morphology within HPSG and provide pointers to concrete work within HPSG and beyond that has grown out of these efforts. In Section 3, I shall discuss inflectional morphology, starting with an overview of the classical challenges (Section 3.1) and assess how the different types of inflectional theories – Item-and-Arrangement (IA), Item-and-Process (IP), and Word-and-Paradigm (WP) – fare with respect to these basic challenges (Section 3.2). Against this backdrop, I shall discuss pre-

<sup>1</sup>See also the work by Meurers (2001), providing a formal description-level formalisation of lexical rules, as standardly used in HPSG.

21 Morphology

vious work on inflection in HPSG (Section 3.3). Section 4 will be devoted to an introduction of Information-based Morphology, a recently developed HPSG subtheory of inflectional morphology.

# **2 Inheritance-based approaches to derivational morphology**

# **2.1 Krieger & Nerbonne (1993)**

Probably the first attempt at a more systematic treatment of morphology is the approach by Krieger & Nerbonne (1993). They note that meta-level lexical rules, as conceived of at the time, move the description of lexical alternations, which are characteristic of morphology, outside the scope of lexical inheritance hierarchies. Consequently, they explore how morphology can be made part of the lexicon. They observe that inflection and derivation differ most crucially with respect to the finiteness of the domain: while inflection is essentially finite (modulo case stacking; Sadler & Nordlinger 2006; Malouf 2000), derivation need not be: they cite repetitive prefixation in German as the decisive example (*Silbe* 'syllable', *Vor-silbe* 'pre-syllable', *Vor-vor-silbe* 'pre-pre-syllable', etc.). Consequently, they propose modelling derivation by means of morphological rule schemata, which are underspecified descriptions of complex lexemes, and integrating them as part of the lexical hierarchy. They adopt a word-syntactic approach akin to Lieber (1992), where affixes are treated as signs that select the bases with which they combine. They propose a number of principles that govern headedness, subcategorisation, and semantic composition. What is special is that all these principles are represented as types in the lexical type hierarchy, cf. Chapter Davis & Koenig (2021), Chapter 4 of this volume. Concrete derivational rule schemata will then inherit from these supertypes. What this amounts to is that different subclasses of derivational processes may be subject to all or only a subset of these principles. They briefly discuss conversion, i.e. zero derivation, and suggest that this could be incorporated by means of unary rules.

# **2.2 Riehemann (1998)**

The work of Riehemann (1998) takes as its starting point the previous proposal laid out in Krieger & Nerbonne (1993), treating derivational processes as partial descriptions of lexemes that are organised in an inheritance type hierarchy and that relate a derived lexeme to a morphological base. Her approach, however,

#### Berthold Crysmann

expands on the previous proposal in two important respects. First, she argues against a word-syntactic approach and suggests instead that only the morphological base, a lexeme, should be considered a sign. Affixes or modification of the base, if any, are syncategorematically introduced by rule application. In contrast to the word-syntactic approach by Krieger & Nerbonne (1993), Riehemann's conceptualisation of derivation as unary rules integrated into the hierarchical lexicon does not give any privileged status to concatenative word formation processes: as a result, it generalises more easily to modificational formations, conversion, and (subtractive) back formations (e.g. *self-destruct* < *self-destruction*).

Second, she conducts a detailed empirical study of *-bar* '-able' affixation in German and shows that besides regular *-bar* adjectives, which derive from transitive verbs and introduce both modality and a passivisation effect, there is a broader class of similar formations which adhere to some of the properties, but not others.

Figure 1: Type hierarchy of German *-bar* derivation according to Riehemann (1998: 64)

### 21 Morphology

She concludes that multiple inheritance type hierarchies lend themselves towards capturing the variety of the full empirical pattern while at the same time providing the necessary abstraction in terms of more general supertypes from which individual subclasses may inherit.

Figure 1 on page 950 provides the extended hierarchy suggested by Riehemann (1998). The type for regular *-bar* adjectives given in (1) is treated as a specific subtype that inherits inter alia from more general supertypes that capture the salient properties that characterise the regular formation, e.g. *anfechtbar* 'contestable', but which also hold to some extent for subregular *-bar* adjectives, e.g. *eßbar* 'edible'.<sup>2</sup>

One property that is almost trivial concerns suffixation of *-bar*, and it holds for the entire class. Suffixation is no exclusive property of *-bar* adjectives, so this property can be abstracted out into the supertype *suffixed* in (2): the type *bar-adj* in Figure 1 inherits this property and specifies the concrete shape of the list appended to the morphological base.

$$\begin{array}{cc} \begin{bmatrix} \textit{suffxed} \\ \textit{PHON} & \Box \end{bmatrix} \oplus \textit{list} \\\ \begin{bmatrix} \textit{MORPH-B} \left\langle \begin{bmatrix} \textit{PHON} \ \Box \end{bmatrix} \right\rangle \end{bmatrix} \end{array}$$

<sup>2</sup>The feature geometry and some further details have been adapted to the conventions used in this book. For a version of Riehemann's lexical rule using the distinction between structural and lexical case (Przepiórkowski 2021, Chapter 7 of this volume) see Müller (2003).

### Berthold Crysmann

A property which is common to most *-bar* adjectives in German is that they denote "possibility", as represented by the type constraint in (3). Exceptions include *zahlbar* 'payable', which denotes necessity instead.

(3) *possibility* SYNSEM|LOC|CONT|NUC|RELN 

Clearly more specific, albeit fairly general still, is the passivisation effect observed with transitive bases, as it does not apply in the same way to verbal bases taking dative (*entrinnbar* 'escapable') or prepositional complements (*verfügbar* 'available/disposable') instead of an accusative, and it does not apply at all to intransitive bases (*brennbar* 'combustible').

$$\begin{array}{l} \text{(4)} \quad \begin{cases} \text{external} \\ \text{SYNSEM} \quad \left[ \text{LOC} | \text{CAT} | \text{SUBJ} \left\{ \text{NP:} \mathbb{D} \right\} \right] \end{cases} \\ \begin{cases} \text{MORPH-B} \left\{ \left[ \text{SYN:} \text{EM} | \text{LOC} | \text{CAT} | \text{COMP} \left\{ \text{.} \mathbb{D} \right\} , \dots \right] \right\} \end{cases} \end{array}$$

 Regular *-bar* adjectives (1) inherit from all these supertypes, which accounts for most of their properties, while at the same time the overall hierarchy of *-bar* constructions captures the relatedness of regular *-bar* adjective to subregular formations.

# **2.3 Koenig (1999)**

Koenig's work on lexical relations has made several important contributions to our understanding of morphological processes within the HPSG lexicon. Based on joint work with Dan Jurafsky (Koenig & Jurafsky 1995), he uses Online Type Construction to turn the hierarchical lexicon, which is actually a static system into a dynamic, generative device. This enables him in particular to make a systematic distinction between open types for regular, productive formations, and closed types for subregular and irregular ones.

Koenig (1999) takes issue with the early conception of lexical rules as metalevel rules either deriving an expanded lexicon from a base lexicon (generative lexical rules), or else establishing relations between items within the lexicon (redundancy rules). He argues on the basis of grammatical function change, such as the English passive, that systematic alternations are amenable to underspecification in the hierarchical lexicon, once cross-classification between types can be performed dynamically.

Online Type Construction depends on a hierarchical lexicon that is organised into an AND/OR network of conjunctive dimensions (represented in boxed capitals) and disjunctive types (in italics). While in a standard type hierarchy any two

### 21 Morphology

types that do not have a common subtype are understood as incompatible, Online Type Construction derives new subtypes by intersection of leaf types from different dimensions. Leaf types within the same dimension are still considered disjoint. Thus, dimensions define the range of inferrable cross-classifications between types, without having to statically list these types in the first place.

In Koenig's conception of the lexicon as a type underspecified hierarchical lexicon (TUHL), the unexpanded lexicon is just a system of types. Concrete lexical items, i.e. instances, are inferred from these by means of Online Type Construction.

Let us briefly consider a simple example for the active/passive alternation: the minimal lexical type hierarchy in Figure 2 is organised into two dimensions, one representing specific lexemes, the other specifying active voice and passive voice linking patterns for lexemes. Concrete lexical items are now derived by crossclassifying exactly one leaf type from one dimension with exactly one leaf type from the other.

Figure 2: Online type construction

An important aspect of this integration of alternations into the hierarchical lexicon is that it becomes quite straightforward to deal with lexical exceptions in a systematic way. The key to this is pre-typing, as illustrated in Figure 3: in English, for instance, some transitive verbs, like possessive *have* fail to undergo passivisation. Rather than marking these verbs diacritically with exception fea-

### Berthold Crysmann

tures, pre-typing to the active pattern precludes their cross-classification with the passive pattern, because leaf types within a dimension are disjoint and pretyping makes this type already a type in both dimensions.

Figure 3: Exceptions via pre-typing

Online Type Construction successfully integrates systematic alternations into type hierarchies. A crucial limitation is, however, that Online Type Construction is confined to finite domains: by itself, it is suitable for inflection and possibly quasi-inflectional, non-recursive processes as grammatical function change, while a full treatment of derivational processes will still require recursive rule types, which remain a possibility in Koenig's general approach to derivational morphology.<sup>3</sup>

The works of Riehemann (1998) and Koenig 1999 had considerable impact on subsequent work on word formation, both within the framework of HPSG and beyond. Within HPSG, several studies of French derivation and compounding

<sup>3</sup>Blevins (2003) discusses the interaction between passives and impersonals in Baltic and Slavic languages and its relevance to some of the issues I just discussed. See Avgustinova et al. (1999) for an account along these lines. Müller (2013: 925–927) and Müller & Wechsler (2014: Section 8.1) take a highly sceptical stance, arguing that interactions in grammatical function change depend on the possibility for one lexical rule to apply to the output of another, or, as in the case of Turkish causatives, a rule may even apply more than once.

### 21 Morphology

directly build on these proposals (e.g. Tribout 2010; Desmets & Villoing 2009). Outside, the development of Construction Morphology (Booij 2010) has largely been influenced by the HPSG work on word formation within a hierarchical lexicon.

# **3 Inflection**

# **3.1 Classical challenges of inflectional systems**

Ever since Matthews (1972), it has been recognised in morphological theory that inflectional systems do not privilege one-to-one relations between function and form, but must rather be conceived of as many-to-many ( : ), in the general case. Thus, while rule-by-rule compositionality can count as the success story of syntax and semantics, this does not hold in the same way for inflection.

Classical problems that illustrate the many-to-many nature of inflection include cumulation, where a single form expresses multiple morphosyntactic properties. An extreme example of cumulation is contributed by the Latin verb *am-o* 'love-1.SG.PRS.IND.AV', which contrasts e.g. with forms *amā-v-i* 'love-PRF-1.SG.AV', where perfective tense is expressed by a discrete exponent *-v*, or present subjunctive *am-ē-m* 'love-SUBJ-1.SG.AV' where mood is expressed by a marker of its own.

The mirror image of cumulation is extended (or multiple) exponence: here, a single property is expressed by more than one exponent. This is exemplified by German circumfixal past participles, such as *ge-setz-t* 'PPP-sit-PPP', which is marked by a prefix *ge-* and a suffix *-t*, jointly expressing the perfect/passive participial property. Another case of multiple exponence is contributed by Nyanja, which marks certain adjectives with a combination of two agreement markers, as discussed on page 977 in Section 4.3. See Caballero & Harris (2012) and Harris (2017) for a typological overview.

Possibly more widely attested than pure multiple exponence is overlapping exponence, i.e. the situation where two exponents both express the same property, but at least one of them also expresses some other property: e.g. many German nouns form the dative plural by suffixation of *-n*, but plural marking is often signalled additionally by stem modification (*Umlaut*): while *Kutter-n* 'tug(M)-DAT.PL' merely shows cumulation of case and number, *Mütter-n* 'mother(F).PL-DAT.PL' exhibits plural marking in both the inflectional ending and the fronting of the stem vowel (cf. singular *Mutter* 'mother.SG').

An extremely wide-spread form of deviation from a one-to-one correspondence between form and function is zero exponence, where some morpho-syn-

### Berthold Crysmann

tactic properties do not give rise to any exponence. In English, regular plural nouns are formed by suffixation of *-s*, as in *jeep/jeeps*, but we also find cases, such as *sheep*, where no overt exponent of plural is present. Likewise, the past tense of English verbs is regularly signalled by suffixation of *-ed*, as with *flip/flipped* or British English *fit/fitted*, but again, there are forms such as *hit/hit* where past is not overtly marked. In German, nouns inflect for four cases and two numbers, yielding eight cells. However, in some paradigms very few cells are actually overtly marked. The feminine noun *Brezen* 'pretzel' does not take any inflectional markings. Similarly, one of the most productive masculine/neuter paradigms, witnessed by *Rechner* 'computer', only shows overt marking for two cells, the genitive singular (*Rechner-s*) and the dative plural (*Rechner-n*), all other forms being bare.

The many-to-many nature of inflectional morphology clearly has repercussions as to how the system is organised. One way to make sense of inflection is in terms of paradigmatic opposition: while it may be hard to figure out what exactly the meaning is of zero case/number marking in German, we can easily establish the meaning of a form like *Rechner* in opposition to the non-bare forms *Rechner-s* 'computer-GEN.SG' and *Rechner-n* 'computer-DAT.PL'. This is even more the case once we consider different paradigms, i.e. different patterns of opposition: the invariant form *Brezen* 'pretzel', for instance, has a wider denotation than *Rechner*, whereas *Auto* 'car' has a narrower denotation, standing in opposition to more cells, cf. Table 21.1(c).

The recognition of paradigms has led to a number of works on syncretism (see, e.g. Baerman et al. 2005), i.e. cases of systematic or accidental identity of form across different cells of the paradigm. Syncretism can give rise to splits of different types (Corbett 2015): natural splits, where syncretic forms share some (non-disjunctive) set of features, Pāṇinian splits, where syncretism corresponds to some default form, and finally morphomic splits, where syncretic forms neither form a natural class nor do they lend themselves to be analysed as a default.

In Table 21.1(a), we find a perfect alignment of syncretic forms along the number dimension. By contrast, Figure 21.1(b) illustrates the case discussed above, where two specific cells constitute overrides to a general default pattern (here zero exponence). Default forms, however, need not involve zero exponence: German features a Pāṇinian split in another paradigm where all forms are marked with *-en* (e.g. *Mensch-en* 'human(s)'), with the exception of the nominative singular (*Mensch* 'human.NOM.SG'), which constitutes a zero override. Table 21.1(c) illustrates how a Pāṇinian split in the singular can combine with a natural split between singular and plural. Finally, Table 21.1(d) illustrates what could be taken

### 21 Morphology


Table 21.1: Paradigmatic splits

(c) Natural & Pāṇinian split

as a morphomic split, where there is no natural alignment between form and function, and no clear way to establish what is the default and what is the override (cf., however, Crysmann & Kihm 2018 for an analysis of the Old French declension system).

The patterns we have just seen have two clear implications for morphological theory: first, many morphologists believe that a version of Pāṇini's Principle, whereby more specific forms can block more general ones, must be part of morphological theory, since otherwise many generalisations will be lost. Second, the many-to-many nature of exponence has a direct impact on the representation of inflectional meaning, which we will explore in the next two subsections.

# **3.2 Typology of inflectional theories**

Current morphological theories differ as to how they establish the relation between a complex form and its parts and how this relation determines the relation between form and function. The classical morpheme-based view of morphology, where inflectional meaning is a property of lexical elements, such as morphemes, constitutes the text book case of what Hockett (1954) has dubbed the Item-and-Arrangement (IA) model. The general criticism that has been raised against such

### Berthold Crysmann

models is that they fail to recognise the paradigmatic structure of inflectional morphology and furthermore need to make extensive appeal to zero morphemes (see Anderson 1992 for a systematic criticism).

The alternative model Hockett (1954) discusses is the Item-and-Process (IP) model where inflectional meaning is introduced syncategorematically by way of rule application. Such approaches are less prone to have difficulties with nonconcatenative processes like modification and zero exponence. However, IP approaches still do not recognise the : nature of inflectional morphology and are therefore expected to have problems with e.g. multiple exponence.

As a reaction to Matthews (1972), new approaches to inflectional morphology were developed taking the notion of paradigms much more seriously. Theories, such as A-Morphous Morphology (Anderson 1992) or Paradigm Function Morphology (Stump 2001) have been classified into the Word-and-Paradigm (WP) category. Crucially, such models locate inflection at the level of the word and rely on realisation rules that associate the word's inflectional properties with exponents that serve to express them. WP approaches contrast with IA in that they do not recognise (classical) morphemes. They differ from IP in that there is neither a notion of incrementality, i.e. that inflectional rules must be informationincreasing, nor that rules are necessarily one-to-one correspondences between (alteration of) form and meaning.

# **3.3 HPSG approaches to inflection**

Over the years, several different proposals have been made regarding the treatment of inflectional morphology in HPSG. From the point of view of the underlying logic, there is no a priori expectation as to the type of model (IA, IP, WP) that would be most compatible with HPSG's basic assumptions. Indeed, every one of the three models have been proposed at some point. However, the arguments against morpheme-based models put forth by Matthews (1972), Spencer (1991), Anderson (1992) and Stump (2001) have been taken quite seriously within the HPSG community, such that there is a clear preference for IP or WP models over IA, notable exceptions being Van Eynde (1994) and, more recently, Emerson & Copestake (2015).

One of the most common ways to express lexical alternations is by means of (description-level) lexical rules. Morphophonological changes effected by such a rule are typically captured by some (often undefined) function on the phonology of the daughter. Since morphological marking is tied directly to rule application, approaches along these lines constitute an instance of an IP model of morphology. Work on morphology in grammar implementation typically follows this

### 21 Morphology

line: in platforms like the Linguistic Knowledge Builder (LKB; Copestake 2002, see also Bender & Emerson 2021: 1116, Chapter 25 of this volume) character unification serves to provide statements of morphophonological changes that can be attached to (unary) lexical rules. See Goodman & Bender (2010) for a proposal as to how requirements for certain inflections and dependencies between morphological rules, e.g. the parts of extended or overlapping exponence, can be captured in a more systematic way, and Crysmann (2015; 2017b) for implementations of non-concatenative morphology.

A notable exception to the function approach is the work of Olivier Bonami (Bonami & Samvelian 2015; Bonami 2015): he argued for the incorporation of an external formal model of morphology into HPSG, namely Paradigm Function Morphology (=PFM; Stump 2001), and showed specifically that the integration should be done at the level of the word, rather than individual lexical rules, in order to reap the benefits of a WP model over an IP model. In a similar vein, Erjavec (1994) explores how a model such as PFM can be cast in typed feature descriptions and observes that the only non-trivial aspect of such an enterprise relates to Pāṇinian competition, which requires a change to the underlying logic. See Section 4.3 for detailed discussion.

In the area of cliticisation, several sketches of WP models have been proposed: e.g. Miller & Sag (1997) provide an explication of the function that realises the pronominal affix cluster, but the proposal was never meant to scale up to a full formal theory of inflection. Crysmann (2003) suggested a realisational, morphbased model of inflection. While certainly more worked-out, the approach was too tailored towards the treatment of clitic clusters.

### **Word-based approaches**

**Krieger & Nerbonne (1993)** As stated above, probably one of the first approaches to morphology in HPSG was developed by Krieger & Nerbonne (1993). What they propose is essentially an instance of a WP model, since they use distributed disjunctions to directly represent entire paradigms, matching exponents with the features they express. Most interestingly, their approach to inflection contrasts quite starkly with their work on derivation (Krieger & Nerbonne 1993), which is essentially a word-syntactic, i.e. morpheme-based, approach.

(5) represents an encoding of the present indicative paradigm for German (cf. the endings in Table 21.2). The distributed disjunction, marked by \$1, associates each element in the disjunctive ENDING value with the corresponding element in the disjunctive AGR value.

### Berthold Crysmann

Table 21.2: Regular present indicative endings for German verbs


(5) Encoding paradigms by distributed disjunctions (Krieger & Nerbonne 1993: 105):

$$\begin{bmatrix} \text{NORPPH} & \begin{bmatrix} \text{STEM} & \boxed{\text{\tiny $\exists$ }}\\ \text{ENDING} \,\boxed{\text{\tiny $}} \left\{ \text{\$ }\_{1} \text{ \text{\tiny $}} \text{\text{\tiny$ }} \text{\text{\tiny $}} \text{\text{\tiny$ }} \text{\text{\tiny $}} \text{\text{\tiny}} \text{\text{\tiny}} \text{\text{\tiny}} \text{\text{\tiny}} \text{\text{\tiny}} \text{\text{\tiny}} \text{\text{\tiny}} \text{\text{\tiny}} \right\} \\ \text{FORM} & \boxed{\text{\tiny$ }} + \boxed{\text{\tiny $}} \end{bmatrix} \\\\ \text{SYNSEM} & \begin{bmatrix} \text{LOCAL} \,\text{\tiny$ } \text{\text{\tiny}} \text{\text{\tiny}} \text{\text{\tiny}} \text{\text{\tiny}} \text{\text{\tiny}} \text{\text{\tiny}} \text{\text{\tiny}} \end{bmatrix} \end{bmatrix} \end{bmatrix}$$

They further argue that partially regular formations, such as *sollen* 'should', which has no ending in the first and third singular can be captured by means of default inheritance, overriding the ENDING value as in (6).

(6) Partial irregularity by overriding default endings (Krieger & Nerbonne 1993: 105): - MORPH - ENDING \$1 "", "st", "", "n", "t", "n" 

Suppletive forms, as for auxiliary *sein* 'be', will equally inherit from (5), yet override the form value, cf. (7).

(7) Suppletive verbs (Krieger & Nerbonne 1993: 106): - MORPH - FORM \$1 "bin", "bist", "ist", "sind", "seid", "sind" 

The approach by Krieger & Nerbonne (1993) has not been widely adopted, partially because few versions of HPSG support default inheritance and even fewer support distributed disjunctions. Koenig (1999: 176–178) also argues against distributed disjunctions on independent theoretical grounds, suggesting that the approach will not scale up to morphologically more complex systems.

**Koenig (1999)** Similar to Krieger & Nerbonne (1993), Koenig (1999) pursues a word-based approach to inflection, in contrast to the IP approach he developed for derivation. He focuses on the distinction between regular, subregular and

### 21 Morphology

irregular formations and explores how these can be represented in a systematic way in lexical type hierarchies using Online Type Construction.

He departs from the observation that words inflect along a finite number of different inflectional dimensions and that within each dimension, pairings of exponents and morphosyntactic features stand in paradigmatic opposition. Furthermore, neither completely uninflected roots, nor partially derived words (e.g. lacking agreement information) shall be able to function as lexical signs, so it is necessary to enforce that inflection be applied. The AND/OR logic of dimensions and types he proposed appears to be very well-suited to account for these properties.

Table 21.3: Future forms of the Swahili verb *taka* 'want'


For illustration, let us consider a subset of his analysis of Swahili verb inflection. As shown in Table 21.3, Swahili verbs (minimally) inflect for polarity, tense and subject agreement.<sup>4</sup>

Koenig (1999: Section 5.5.2) suggests that the inflectional morphology of Swahili can be directly described at the word level. Accordingly, he proposes a type hierarchy of word-level inflectional constructions as given in Figure 4.

As shown in Table 21.3, tensed verbs with plural subjects take three prefixes in the negative and two in the positive, with the exponent of negative preceding the exponent of subject agreement, preceding in turn the exponent of tense. Koenig (1999) proposes three dimensions of inflectional construction types that correspond to the three positional prefix slots. Since dimensions are conjunctive, a well-formed Swahili word must inherit from exactly one type in each dimension. As he states, the AND/OR logic of dimensions and types is the declarative analogue of the conjunctive rule blocks and disjunctive rules in A-Morphous Morphology (Anderson 1992).

Types in the dimensions are partial word-level descriptions of (combinations of) prefixes. As shown by the sample types in (8), these partial descriptions pair

<sup>4</sup>The full paradigm recognises inflection for object agreement and relatives, but this shall not concern us here, it being sufficient that inflectional paradigms may be large but finite.

### Berthold Crysmann

Figure 4: Koenig's (1999: 171) constructional approach to Swahili position classes

some morphosyntactic properties (-FEAT) with constraints on the prefixes: the type ¬*1sg-neg*, for instance, constrains the first prefix slot to be *ha-*, while leaving the other slots underspecified. These will be further constrained by appropriate types from the other two dimensions. Likewise, the type *1sg-pos*, constrains slot 2 to be *ni-*, but specifies the further requirement that the verb be [NEG −].

(8) Sample types for Swahili:

$$\begin{array}{ll} \text{a. } \neg \text{Isg-neg} & \\ & \begin{bmatrix} \text{PH} & \text{AFF} \left[ \text{PREF} \left\{ \text{ha}, \dots, \dots \right\} \right] \end{bmatrix} \\ & \begin{bmatrix} \text{AAT} \left[ \text{HEAD} \left[ \mu \text{-FEAT} \left[ \text{NEG} + \right] \right] \right] \end{bmatrix} \\ & \text{b. } \text{Isg-pos} \\ & \begin{bmatrix} \text{PH} & \text{AFF} \left[ \text{PREF} \left\{ \dots, \text{ni}, \dots \right\} \right] \end{bmatrix} \\ & \begin{bmatrix} \text{FAT} \left[ \text{HEAD} \left[ \mu \text{-FEAT} \left[ \begin{bmatrix} \text{NEG} & - \\ \text{SUBJ-AGR} & \begin{bmatrix} \text{PER} & I \\ \text{NUM } \text{Sg} \end{bmatrix} \end{bmatrix} \right] \right] \\ & \text{c. } \text{Isg-neg} \\ & \begin{bmatrix} \text{PH} & \begin{bmatrix} \text{AFF} \left[ \text{PREF} \left\{ \text{si}, \left\langle \text{,...} \right\rangle \right] \right] \end{bmatrix} \\ & \text{CAT} \left[ \text{HEAD} \left[ \mu \text{-FEAT} \left[ \begin{bmatrix} \text{NEG} & + \\ \text{SUBJ-AGR} & \begin{bmatrix} \text{PER} & I \\ \text{NUM } \text{Sg} \end{bmatrix} \right] \right] \right] \end{array} \end{array}$$

Pre-linking of types finally permits a straightforward treatment of cumulation across positional slots: e.g. the type *1sg-neg* simultaneously satisfies requirements for the first and second slot, constraining one of the prefixes to be portmanteau *si-*, the other one to be empty. Thus, by adopting a constructional perspective on inflectional morphology, Koenig (1999) can capture interactions between different affix positions. There is, however, one important limitation to a

### 21 Morphology

direct word-based perspective: situations where exponents from the same set of markers may (repeatedly) co-occur within a word cannot be captured without an intermediate level of rules. Such a situation is found with subject and object agreement markers in Swahili – so-called parallel position classes (Stump 1993; Crysmann & Bonami 2016) –, as well as with exuberant exponence in Batsbi (Harris 2009; Crysmann 2021). We shall come back to the issue in Section 4.5. Finally, since exponents are directly represented on an affix list under Koenig's approach, position and shape cannot always be underspecified independently of each other, which makes it more difficult to capture variable morphotactics (see Section 4.4).

An aspect of (inflectional) morphology that Koenig (1999) pays particular attention to is the relation between regular, subregular and irregular formations. He approaches the issue on two levels: the level of knowledge representation and the level of knowledge use.

At the representational level, regular formations, e.g. past tense *snored*, are said to be intensionally defined in terms of regular rule types that license them: results of regular rule application are consequently not listed in the lexicon. Rather, they are constructed either by Online Type Construction or by rule application. Irregular formations, by contrast, are fully listed, e.g. the past tense form *took* of a verb like *take*. Most interesting are subregular types, e.g. *sing/sang/sung* or *ring/rang/rung*: like irregulars, class membership is extensionally defined by enumeration, but the type hierarchy can still be exploited to abstract out common properties.

With regular formations being defined in terms of productive schemata, an important task is to preempt any subregular or irregular root from undergoing the regular, productive pattern. Koenig (1999) discusses three different approaches in depth: a feature-based approach, and two ways of invoking Pāṇini's Principle. As for the former, he shows that the costs associated with diacritic exception features is actually minimal, i.e. it is sufficient to specify irregular and subregular bases as[IRR +] and constrain the regular rule to [IRR −]. Thus, use of such diacritics does not need to be stated for the large and open class of regular, productive bases. Despite the relatively harmless effects of the feature-based approach, it should be kept in mind that this approach will not scale up to a full treatment of Pāṇinian competition.<sup>5</sup>

Koenig (1999) proposes two variants of a morphological and/or lexical blocking theory. In essence, he builds on a previous formulation by Andrews (1990)

<sup>5</sup>This is because first, every default/override pair would need to be stipulated, and second, if a paradigm has defaults in different dimension (e.g. a default tense, or a default agreement marking), each would need its own diacritic feature.

### Berthold Crysmann

within LFG to define a notion of morphological competition based on subsumption. Since competition is between different realisations for the same morphological features, he applies a restrictor on form-related features to then establish competition in terms of unilateral subsumption (⊏): i.e. a rule-description that is more specific than some other rule (modulo form-oriented features) will take precedence. I shall not go into the details of Koenig's Blocking Principle here, since we shall come back to a highly similar formulation of Pāṇinian competition in Section 4.3. Koenig (1999) discusses two different ways this can be accomplished: one is a compilation approach where complementation is used to make the more general type disjoint, whereas the other relegates the problem to the area of knowledge use. While the usage-based interpretation may appear preferable, because it does not require expansion of the lexical type-hierarchy, it leaves open the question why this kind of competition is mainly restricted to lexical knowledge. On the other hand, the static compilation approach requires prior expansion of the type underspecified lexicon in order to give sound results under restriction, a point made in Crysmann (2003).

To summarise, several WP proposals have been made to replace the IP model tacitly assumed by many HPSG syntacticians, which merely attaches some morpho-phonological function to a lexical rule. Bonami (Bonami & Samvelian 2008; Bonami & Boyé 2006; 2007; Bonami 2011) proposed directly "plugging in" a credible external framework, namely Paradigm Function Morphology (Stump 2001), Koenig (1999) suggested a word-based model. Neither approach has proven to be fully satisfactory. Use of an external theory, such as PFM, begs the question why we need a different formalism in order to implement a theory of inflection, rather than exploiting the power of inheritance and cross-classification in hierarchies of typed feature structure descriptions. Word-based approaches suffer from problems of scalability with morphotactically complex systems. These issues led to the development of Information-based Morphology (Crysmann & Bonami 2016), which will be discussed in the next section.

# **4 Information-based Morphology**

Information-based morphology (Crysmann & Bonami 2016) is a theory of inflectional morphology that systematically builds on HPSG-style typed feature logic in order to implement an inferential-realisational model of inflection. As the name suggests, in reference to Pollard & Sag (1987), it aims at complementing HPSG with a subtheory of inflection that systematically explores underspecifica-

21 Morphology

tion and cross-classification as the central device for morphological generalisations.

IbM clearly builds on previous HPSG work on morphology and the lexicon: Online Type Construction (Koenig & Jurafsky 1995) can be cited here in the context of the underlying logic. Similarly, the decision to represent morphotactics in terms of a flat lists of segmentable exponents (=morphs) draws on previous work by Crysmann (2003).

# **4.1 Architecture and principles**

The architecture of IbM is quite simple: essentially, words are assumed to introduce a feature INFL that encapsulates all features relevant to inflection.<sup>6</sup> At the top-level, these comprise MPH, a partially ordered list of exponents (*m(or)ph*), a morphosyntactic (or morphosemantic) property set MS associated with the word, and finally RR, a set of realisation rules that establish the correspondence between exponents and morphosyntactic properties.

(9) *word* ⇒ INFL MPH *list*(*mph*) RR *set realisation-rule* MS *set*(*msp*) 

From the viewpoint of inflectional morphology, words can be regarded as associations between a phonological shape (PH) and a morphosyntactic property set (MS), the latter including, of course, information pertaining to lexeme identity. This correspondence can be described in a maximally holistic fashion, as shown in (10), where a phonological form is paired with information about lexemic identity (LID) and a morphosyntactic property (TAM). Throughout this section, I shall use German (circumfixal) passive/past participle (*ppp*) formation, as witnessed by *ge-*setz*-t* 'put', for illustration.

$$\begin{array}{ll} \text{(10)} & \begin{bmatrix} \text{pH} & \begin{Bmatrix} \text{gesetzt} \end{Bmatrix} \\\\ \text{INFL} & \begin{Bmatrix} \text{MS} \left\{ \begin{bmatrix} \text{LID setzen} \end{bmatrix}, \begin{bmatrix} \text{TAM} \ ppp \end{bmatrix} \right\} \end{bmatrix} \end{array} \end{array}$$

<sup>6</sup>For the purposes of this chapter, I shall make the somewhat simplifying assumption that inflection is a property exclusively associated with words. However, Koenig & Michelson (2020) present compelling evidence from nominalisation in Oneida, showing that derivational processes in this language may target (partially) inflected bases, including nominalisation of aspectually inflected verbal stems, as well as incorporation of inflected and derived nominals into polysynthetic verbs. It therefore seems necessary to generalise the interface between lexical types and inflectional morphology in such a way that realisational morphology can be applied to sub-word units within a derivational chain.

### Berthold Crysmann

Since words in inflectional languages typically consist of multiple segmentable parts, realisational models provide means to index position within a word: while in A-Morphous Morphology (=AM;Anderson 1992) and Paradigm Function Morphology (=PFM; Stump 2001) ordered rule blocks perform this function, IbM uses a list of morphs (MPH) to explicitly represent exponents. The sample wordlevel representation in (11) illustrates the kind of information represented on the MPH list and the MS set.

(11) Structured association of form (MPH) and function (MS)

$$\begin{array}{l} \text{a. } \begin{cases} \text{Word:} \\ \begin{bmatrix} \text{PHH} \left\langle \begin{bmatrix} \text{PH } \circ \text{ge-} \\ \text{PC } \cdot \text{I} \end{bmatrix}, \begin{bmatrix} \text{PH } \circ \text{set} \\ \text{PC } \, \partial \end{bmatrix}, \begin{bmatrix} \text{PH } \circ \text{t} \\ \text{PC } \, \, \text{I} \end{bmatrix} \right\rangle \\ \text{a. } \begin{Bmatrix} \text{I} \, \text{Lpp } \ast \text{zen} \\ \text{b. } \text{Abstraction of circunfExation (1 : n):} \\ \begin{bmatrix} \text{MPH} \left\langle \begin{bmatrix} \text{PH } \circ \text{ge-} \\ \text{PC } \, \text{I} \end{bmatrix}, \begin{bmatrix} \text{PH } \circ \text{t} \\ \text{PC } \, \, \text{I} \end{bmatrix}, \dots \end{bmatrix} \right\rangle \\ \text{M} \end{array} \end{array}$$

While elements of the MS set are either inflectional features or lexemic properties, the latter comprising e.g. information about the stem shape or inflection class membership, MPH is a list of structured elements (of type *mph*, cf. (12)) consisting of a phonological description (PH) paired with a position class index (PC), which serves to establish linear order of exponents. In some previous work on IbM, MPH was assumed to be a set, which is possible since order can be determined on the basis of PC indices alone. More recently, however, it is assumed to be a list, which is slightly redundant, yet permits much more parsimonious descriptions of principles and rules.

$$(12)\quad mph \Rightarrow \begin{bmatrix} \text{PH } list(phonon) \\ \text{PC } pos\text{-}class \end{bmatrix}$$

The reification of position and shape as first-class citizens of morphological representation is one of the central design decisions of IbM: as a result, constraints on position and shape will be amenable to the very same underspecification techniques as all other morphological properties. As a consequence, IbM eliminates structure from inflectional morphology, which clearly distinguishes this approach from other inferential-realisational approaches, such as PFM or AM, where order is derived from cascaded rule application. Although IbM recognises a minimal structure in terms of segmentable morphs, there is no hierarchy involved. AM and PFM, by contrast, reject derived structure, to borrow a term

21 Morphology

from Tree Adjoining Grammar, but this potential advantage is more than offset by their abundant use of derivation structure.

By means of underspecification, i.e. partial descriptions, one can easily abstract out realisation of the past participle property, arriving at a direct wordbased representation of circumfixal realisation, as shown in (11). Yet, a direct word-based description does not easily capture situations where the same association between form and content is used more than once in the same word, as is arguably the case for Swahili (Stump 1993; Crysmann & Bonami 2016; 2017) or Batsbi (Harris 2009; Crysmann 2021).

By introducing a level of realisation rules (RR), reuse of resources becomes possible. Rather than expressing the relation between form and function directly at the word level, IbM assumes that a word's description includes a specification of which rules license the realisation between form and content, as shown in (13).

(13) Association of form and function mediated by rule:

Recognition of a level of realisation rules that mediate between parts of form and parts of function slightly increases the complexity of morphological descriptions beyond a simple pairing of form-related MPH lists and function-related MS sets.

The crucial point about realisation rules is that they take care of parts of the inflection of an entire word independently of other realisation rules. Thus, in IbM, realisation rules are explicitly defined in terms of the set of morphosyntactic features they express, as opposed to contextually conditioning features. To that end, realisation rules introduce a feature MUD (Morphology Under Discussion), in addition to MPH and MS, in order to single out the morphosyntactic features that are licensed by application of the rule. Thus, MUD specifies the subset of the morphosyntactic property set MS that the rule serves to express, as detailed in (14).

### Berthold Crysmann

$$\begin{array}{c} \text{(14)} \quad \begin{array}{c} \text{realisation-rule} \\ \text{Ms} \quad \boxed{\text{!}} \text{ set} (msp) \\ \text{!} \text{\*\*} \quad \boxed{\text{!}} \text{cup} \text{ set} (msp) \end{array} \end{array}$$

Realisation rules (members of set RR) pair a set of morphological properties to be expressed, the morphology under discussion (MUD), with a list of morphs that realise them (MPH). Since MUD, being a set, admits multiple morphosyntactic properties, and since MPH, being a list, admits multiple exponents, realisation rules in fact establish : relations between function and form: thus, the many-to-many nature of inflectional morphology is captured at the most basic level. It is this very property that sets the present framework apart from cascaded rule models of inferential-realisational morphology (Anderson 1992; Stump 2001), which attain this property only indirectly as a system: rules in these frameworks are : 1 correspondences between functions and form, but since rules in different rule blocks may express the same functions, the system as a whole can capture : relations.

$$\text{(15)}\quad \text{Morphological well-formedness:}$$


Given two distinct levels of representation, the morphological word and the rules that license it, it is of course necessary to define how constraints contributed by realisation rules relate to the overall morphological makeup of the word. Realisation rules per se only provide recipes for matching morphosyntactic properties onto exponents and vice versa. In order to describe well-formed words, it is necessary to enforce that these recipes actually be applied. IbM regulates the relation between word-level properties and realisation rules by means of a rather straightforward principle, given in (15): this very general principle of morphological well-formedness ensures that the properties expressed by rules add up to the word's property set, and that the rules' MPH lists add up to that of the word, such that no contribution of a rule may ever be lost. This principle of general well-formedness in (15) bears some resemblance to LFG's principles of completeness and coherence (Bresnan 1982a), as well as to the notion of "Total Accountability" proposed by Hockett (1947). Since : relations are recognised at the most basic level, i.e. morphological rules, mappings between the contributions of the rules and the properties of the word can (and should) be 1 : 1. We

21 Morphology

shall see below that this makes possible a formulation of morphological wellformedness in terms of exhaustion of the morphosyntactic property set.

In essence, a word's morphosyntactic property set (MS) will correspond to the non-trivial set union (]) of the rules' MUD values: While standard set union (∪) allows for the situation that elements contributed by two sets may be collapsed, non-trivial set union (]) insists that the sets to be unioned must be disjoint. The entire morphosyntactic property set of the word (MS) is visible on each realisation rule by way of structure sharing ( 0 ).

Finally, a word's sequence of morphs, and hence its phonology, will be obtained by shuffling ( ) the rules' MPH lists in ascending order of position class (PC) indices (see Chapter Müller (2021a: 391), Chapter 10 of this volume for a definition of the shuffle relation, also known as sequence union). This is ensured by the Morph Ordering Principle given in (16), adapted from Crysmann & Bonami (2016).

(16) Morph Ordering Principle (MOP):

$$\begin{array}{l} \text{a. } \textit{Concatenation:}\\ \quad \textit{word} \Rightarrow \begin{bmatrix} \mathtt{PH} & \box{\mathtt{D}} \oplus \ldots \oplus \mathtt{\Xi} \\ \mathtt{INFL} & \mathtt{IMPH} \left\langle \left[ \mathtt{PH} \left[ \box{\mathtt{D}} \right] \ldots, \ldots \left[ \mathtt{PH} \left[ \box{\mathtt{m}} \right] \right] \rangle \right] \end{bmatrix} \\ \textit{b. } \textit{Order:}\\ \quad \textit{word} \Rightarrow \neg \left( \left[ \begin{bmatrix} \mathtt{INF} & \box{\mathtt{MPH}} \left\langle \ldots \left[ \begin{array}{l} \mathtt{pc} \left[ \box{\mathtt{m}} \right] \ldots \end{array} \right] , \begin{bmatrix} \mathtt{pc} \left[ \box{\mathtt{m}} \right] \ldots \end{bmatrix} \right] \right] \land \left( \underline{\mathtt{m}} \right) \geq \mathtt{\Xi} \right) \end{array}$$

While the first clause in (16a) merely states that the word's phonology is the concatenation of its constituent morphs, the second clause (16b) ensures that the order implied by position class indices (PC) is actually obeyed. Bonami & Crysmann (2013) provide a formalisation of morph ordering using list constraints.

Given the very general nature of the well-formedness constraints and particularly the commitment to monotonicity embodied by (15), it is clear that most if not all of the actual morphological analysis will take place at the level of realisation rules.

# **4.2 Realisation rules**

The fact that IbM, in contrast to PFM or AM, recognises : relations between form and function at the most basic level of organisation, i.e. realisation rules, means that morphological generalisations can be expressed in a single place, namely simply as abstractions over rules. Rules in IbM are represented as descriptions of typed feature structures organised in an inheritance hierarchy, such that properties common to leaf types can be abstracted out into more general supertypes. This vertical abstraction is illustrated in Figure 5. Using again German

#### Berthold Crysmann

past participles as an example, the commonalities that regular circumfixal *ge-...-t* (as in *gesetzt* 'put') shares with subregular *ge-...-en* (as in *geschrieben* 'written') can be generalised as the properties of a rule supertype from which the more specific leaves inherit. Note that essentially all information except choice of suffixal shape is associated with the supertype. This includes the shared morphotactics of the suffix.

$$
\begin{bmatrix}
\text{MUD} \left\{ \begin{bmatrix} \text{TAM } ppp \end{bmatrix} \right\} \\
\text{MPH} \left\langle \begin{bmatrix} \text{PH } ge \\ \text{PC } \cdot \text{I} \end{bmatrix}, \begin{bmatrix} \text{pc } \,\text{I} \end{bmatrix} \right\rangle \\
\\ \text{MPH} \left\langle \dots, \begin{bmatrix} \text{PH } t \end{bmatrix} \right\rangle \\
\\ \text{MPH} \left\langle \dots, \begin{bmatrix} \text{PH } t \end{bmatrix} \right\rangle \end{bmatrix} \quad \left[ \begin{smallmatrix} \text{MPH } \left\langle \dots, \begin{bmatrix} \text{PH } en \end{bmatrix} \right\rangle \\
\\ \end{bmatrix} \right]
$$

Figure 5: Vertical abstraction by inheritance

In addition to vertical abstraction by means of standard monotonic inheritance hierarchies, IbM draws on Online Type Construction (Koenig & Jurafsky 1995): using dynamic cross-classification, leaf types from one dimension are distributed over the leaf types of another dimension. This type of horizontal abstractions permits modelling of systematic alternations, as illustrated once more with German past participle formation:

	- b. über-setz-*t* 'translated'



In the more complete set of past participle formations shown in (17), we find alternation not only between choice of suffix shape (*-t* vs. *-en*), but also between presence vs. absence of the prefixal part (*ge-*).

Figure 6 shows how Online Type Construction provides a means to generalise these patterns in a straightforward way: while the common supertype still captures properties true of all four different realisations – namely the property to be expressed and the fact that it involves at least a suffix –, concrete prefixal and suffixal realisation patterns are segregated into dimensions of their own (indicated by PREF and SUFF ). Systematic cross-classification (under unification) of types in PREF with those in SUFF yields the set of well-formed rule instances, e.g. distributing the left-hand rule type in PREF over the types in SUFF yields

21 Morphology

$$
\begin{bmatrix}
\text{MMD} \left\{ \begin{bmatrix} \text{TAM } ppp \\ \text{I} \end{bmatrix} \right\} \\
\\
\text{[PREF]} \\
\\
\\
\begin{bmatrix}
\text{PREF} \\
\text{[PREF]} \\
\\
\text{[PCP :-I]}
\end{bmatrix}
\end{bmatrix}
$$

$$
\begin{bmatrix}
\text{MPH} \left\{ \begin{bmatrix} \text{PH } \text{ge} \\ \text{PC :-I} \end{bmatrix} \begin{bmatrix} \text{I} \end{bmatrix} \right\} & \begin{bmatrix} \text{MPH} \left\{ \begin{bmatrix} \text{I} \end{bmatrix} \right\} \end{bmatrix} & \begin{bmatrix} \text{MPH} \left\{ \begin{bmatrix} \text{I} \end{bmatrix} \begin{bmatrix} \text{pH } \end{bmatrix} \right\} \end{bmatrix} & \begin{bmatrix} \text{MPH} \left\{ \begin{bmatrix} \text{PHP } \text{en} \end{bmatrix} \right\} \end{bmatrix}
\end{bmatrix}
$$

Figure 6: Horizontal abstraction by dynamic cross-classification

the rules for *ge-setz-t* and *ge-schrieb-en*, whereas distributing the right hand rule type in PREF gives us the rules for *über-setz-t* and *über-schrieb-en*, which are characterised by the absence of the participial prefix.

Having illustrated how the kind of dynamic cross-classification offered by Online Type Construction is highly useful for the analysis of systematic alternation in morphology, it seems necessary to lay out in a more precise fashion its exact workings. In its original formulation by Koenig & Jurafsky (1995) and Koenig (1999), Online Type Construction was conceived as a closure operation on underspecified lexical type hierarchies. IbM merely redeploys their approach for the purposes of inflectional morphology. Essentially, a minimal type hierarchy as in Figure 6 provides instructions on the set of inferrable subtypes: according to Koenig & Jurafsky (1995), dimensions are conjunctive and leaf types are disjunctive. Online Type Construction dictates that any maximal subtype must inherit from exactly one leaf type in each dimension. The maximal types of the hierarchy thus expanded serve as the basis for rule instances, i.e. actual rules.<sup>7</sup>

# **4.3 Pāṇinian competition**

In accordance with most theories of inflection (Prince & Smolensky 1993; Stump 2001; Anderson 1992; Noyer 1992; Kiparsky 1985), IbM embraces a version of Morphological Blocking, also known as the Elsewhere Condition (Kiparsky 1985) or Pāṇini's Principle. The basic intuition behind Pāṇinian competition is that more specific rules can block the application of more general rules, where the

<sup>7</sup>There are two ways of conceptualising the status of Online Type Construction in grammar: under the dynamic view, hierarchies are underspecified and the full range of admissible types and therefore the range of instances is inferred online. Under the more conservative static view, the underspecified description is merely a convenient shortcut for the grammar writer. In either case, generalisations are preserved.

### Berthold Crysmann

most unspecific rule will count as a default. In terms of feature logic, the notion of specificity corresponds to some version of the subsumption relation.

Competition between rules or lexical entries does not follow from the logic standardly assumed within HPSG: if a rule can apply, it will apply, no matter whether there are any more specific or more general rules that could have applied as well (in fact, they would apply as well). Thus, implementation of a notion of morphological blocking necessitates a change to the logic.

As has been discussed already in Koenig (1999), preemption based on specificity of information can be either addressed statically (at "compile-time") as an issue of knowledge representation or dynamically (at "run-time") as a question of knowledge use. Independently of the choice between a static or dynamic version of preemption, the main task is to provide a notion of competitor. In the interest of representing Pāṇinian inferences transparently in the type hierarchy, IbM makes use of a closure operation on rule instances, as detailed in (18), which is clearly inspired by Koenig (1999) and Erjavec (1994). 8

### (18) *Pāṇinian Competition (PAN)*


The first clause establishes competition, ensuring subsumption with respect to both expressed features (MUD) and conditioning features (MS descriptions).<sup>9</sup> If the condition in (18a) is met, the use conditions of the more general rule are specialised in such a way (18b) as to make the two rule descriptions fully disjoint.

For concreteness, let us consider some examples from Swahili: as shown in Table 21.4, the negative in Swahili is typically formed by a prefix *ha-*, preceding the equally prefixal exponents of subject agreement and tense (future *ta-*). However, in the negative first singular, discrete realisation of *ha-* and *ni-* is blocked by the portmanteau *si-*. Here, we have a classical case of Pāṇinian competition, where a rule that expresses both negative and first person singular agreement preempts application of the more general individual rules for negative or first person singular.

<sup>8</sup>Alternatively, for a dynamic approach, it will be sufficient to use clause (18a) and perform a topological sort on rule instances, ordering more specific rules before more general ones.

<sup>9</sup>Since MUD values can be of different cardinality, the subsumption is checked on open sets containing the original MUD sets.

21 Morphology

Table 21.4: Future forms of the Swahili verb *taka* 'want'


In the case of *si*, we find the portmanteau in the same surface position as the exponents it is in competition with. However, this need not be the case, nor indeed is preemption of this kind limited to adjacency. Relative negative *si-*, for instance, is realised in a position following the subject agreement marker, yet still, by virtue of expressing negative in the context of relative marking, it blocks realisation of negative *ha-* in pre-agreement position. This constitutes a case of what Noyer (1992) calls "discontinuous bleeding".

	- b. watu people wa-SBJ.PL.M/WA *si-*NEG.REL o-REL.PL.M/WA soma read 'people who don't read'
	- c. \* watu people ha-NEG wa-SBJ.PL.M/WA *(si-)* NEG.REL o-REL.PL.M/WA soma read

The relevant realisation rules for *ha-*, *ni-*, and the two markers *si-*, can be formulated quite straightforwardly as in (20a–d). For expository purposes, I shall make explicit the fact that MUD is necessarily contained in MS.

$$\begin{array}{ll} \text{(20)} & \text{a. } \begin{bmatrix} \text{MUD} \begin{bmatrix} \Pi \end{bmatrix} \begin{Bmatrix} neg\end{Bmatrix} \\\\ \text{Ms} & \overline{\square} \cup \text{set} \\\\ \text{MPH} & \left\langle \begin{bmatrix} \text{PH} \ \begin{Bmatrix} \text{ha} \end{Bmatrix} \right\rangle \right\rangle \\\\ \text{PC 1} \end{Bmatrix} \end{array} & \begin{Bmatrix} \text{MUD} \begin{bmatrix} \square \end{Bmatrix} \begin{Bmatrix} \text{bmatrix} \square \end{Bmatrix} \left\langle \begin{Bmatrix} \text{bmatrix} \text{bmatrix} \\\\ \text{NUM 8g} \end{Bmatrix} \right\rangle \end{Bmatrix} \\\\ & \begin{Bmatrix} \text{MUD} \begin{bmatrix} \square \end{Bmatrix} \cup \text{set} \\\\ \text{MPH} & \left\langle \begin{bmatrix} \text{pH} \ \begin{Bmatrix} \text{ni} \end{Bmatrix} \right\rangle \right\rangle \end{Bmatrix} \end{array} \end{array}$$

Berthold Crysmann

$$\text{c.} \begin{array}{|c|c|c|} \hline \text{MUD} \left[ \begin{matrix} \text{neg}, \\ \text{subj} \\ \text{pER} & 1 \\ \text{NUM sg} \end{matrix} \right] & \text{c.} \\ \begin{matrix} \text{MPH} \left\{ \begin{matrix} \text{pH} \ \text{\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_$$

On the basis of the definition in (18a), portmanteau *si* in (20c)<sup>10</sup> is a competitor for both *ni-* (20b) and *ha-* (20c), since the MUD of portmanteau *si-* expands, i.e. is subsumed by each of the sets containing the MUD value of *ni-* or *ha-*. Moreover, the MS value of portmanteau *si-* is properly subsumed by *ni-* (and *ha-*). Accordingly, the rule for *ni-* will be expanded as in (21a). Similarly, in a first iteration, *ha-* will be specialised as in (21b).

$$\begin{array}{c} \text{(21)} \quad \text{a.} \begin{bmatrix} \text{MUD} \box{\text{D}} \left\{ \begin{bmatrix} \text{subj} \\ \text{PER} & I \\ \text{NUM} \text{sg} \end{bmatrix} \right\} \\\\ \text{MS} \quad \boxed{\text{D}} \cup \text{set} \land \neg \left\{ \text{neg} ,... \right\} \\\\ \text{MPH} \left\langle \begin{bmatrix} \text{PH} \ \langle \text{ni} \rangle \end{bmatrix} \right\rangle \end{array} \quad \text{b.} \begin{cases} \text{MUD} \box{\text{D}} \left\{ \text{neg} \right\} \\\\ \text{MS} \quad \boxed{\text{D}} \cup \text{set} \land \neg \left\{ \begin{bmatrix} \text{subj} \\ \text{PER} & I \\ \text{NUM} \text{sg} \end{bmatrix} \right\} \\\\ \text{MPH} \left\langle \begin{bmatrix} \text{PH} \ \langle \text{ha} \rangle \end{bmatrix} \right\rangle \end{cases} \end{array}$$

However, *ha-* (20a) has another competitor, namely negative relative *si-* (20d): while in this case the MUD values are equally informative, the rules differ in terms of their MS descriptions, with *si-* being conditioned on relative and *ha-* being unconditioned. Expansion by Pāṇinian competition will add another existential constraint to (21b). The fully expanded entry is given in (22).

$$\begin{array}{c} \text{(22)}\\ \text{(22)}\\ \text{(22)}\\ \begin{array}{c} \text{(22)}\\ \end{array} \begin{array}{c} \text{(22)}\\ \text{(22)}\\ \end{array} \begin{array}{c} \text{(22)}\\ \text{(22)}\\ \text{(22)}\\ \end{array} \begin{array}{c} \text{(22)}\\ \text{(22)}\\ \text{(22)}\\ \end{array} \begin{array}{c} \text{(22)}\\ \text{(22)}\\ \end{array} \begin{array}{c} \text{(22)}\\ \text{(22)}\\ \end{array} \begin{array}{c} \text{(22)}\\ \text{(22)}\\ \end{array} \begin{array}{c} \text{(22)}\\ \text{(22)}\\ \end{array} \begin{array}{c} \text{(22)}\\ \text{(22)}\\ \end{array} \begin{array}{c} \text{(22)}\\ \text{(22)}\\ \text{(22)}\\ \end{array} \end{array} \begin{array}{c} \text{(22)}\\ \text{(22)}\\ \text{(22)}\\ \text{(22)}\\ \end{array} \end{array} \begin{array}{c} \text{(22)}\\ \text{(22)}\\ \text{(22)}\\ \text{(22)}\\ \end{array} \end{array} \begin{array}{c} \text{(22)}\\ \text{(22)}\\ \text{(22)}\\ \text{(22)}\\ \text{(22)}\\ \end{array}$$

A common case of default realisation is zero exponence: as illustrated by the German nominal paradigms in Table 21.1, only a small number of the cells feature overt exponents. For example in the paradigm of *Oma* 'granny' (Table 21.1a),

<sup>10</sup>IbM uses the notation *..* to represent spans of position classes. See Bonami & Crysmann (2013) for a proposal of how spans can be made explicit.

### 21 Morphology

singular number is solely expressed by the significant absence of any exponents. Particularly relevant to the case of default zero realisation are the paradigms exhibiting a Pāṇinian split, e.g. that of *Rechner* 'computer': here, only two cells are actually marked with a specific exponent (genitive singular and dative plural), all others are zero-marked and receive their interpretation by means of paradigmatic contrast. In order to allow for the possibility of zero realisation and to lend it the status of an ultimate default in the absence of any overt realisation, realisational approaches such as AM and PFM assume that every rule block returns an unmodified base, unless preempted by a more specific rule. In PFM, this property is ensured by the Identity Function Default (IFD) (Stump 2001: 53). Having a default principle, such as the IFD, is economical in that it saves restating the identity function for every rule block. On the downside, the IFD as a metalevel default will always be able to apply, possibly making an account of gaps in paradigms more difficult. In IbM, zero exponence is captured by providing rule types that contribute an empty list of morphs, as shown in Figure 7 below. With an underspecified MUD value, such a rule type may act as a default realisation.

One assertion that has been made repeatedly in IbM work concerns default zero exponence, the thesis being that there is need for only a single instance. The current formulation of Pāṇini's principle works as desired within an inflectional dimension, e.g. tense or polarity, but not for a rule that has a fully underspecified MUD element, since such a rule would only be applicable if neither tense nor polarity had a non-default value. The rule for zero exponence suggested in Crysmann & Bonami (2016), for example, realises a property (one underspecified element on MUD) without contributing any morph, as shown in Figure 7.

(b) Simple type with more specific subtypes

Figure 7: Default zero realisation

A simple solution is to provide subtypes of the ultimate default for every inflectional dimension that witnesses zero exponence: the rule type in Figure 7a, for instance, could be specialised by adding appropriate subtypes, e.g. for tense and polarity, as in Figure 7b. While this is slightly less general than what might

### Berthold Crysmann

have been hoped for, the finer control that this move provides is independently required to strike the right analytical balance between zero exponence as a fallback strategy and the existence of defectiveness, i.e. gaps in paradigms.<sup>11</sup>

Having seen how Pāṇinian competition can be made explicit, we shall briefly have a look at how this global principle interacts with multiple and overlapping exponence.

Let us start with overlapping exponence, which is much more common than pure multiple exponence. As witnessed by the Swahili examples in (23) and (24), the regular exponent of negation combines with tense markers for past and future. However, while the exponent for future is constant across affirmative and negative (23), the negative past marker *ku-* in (24) displays overlapping exponence.

	- b. ha-NEG tu-1PL ta-FUT taka want 'we will not want'
	- b. \*(ha-) NEG tu-1PL ku-PST.NEG taka want 'we did not want'

There are, in principle, two ways to picture cases of overlapping exponence as in (24b): either *ku-* is regarded as cumulation of negative and past, or else it is an exponent of past, allomorphically conditioned by the negative. Following Carstairs (1987), IbM embraces a notion of inflectional allomorphy by way of distinguishing between expression of a feature and conditioning by some feature.

<sup>11</sup>Alternatively, this expansion could be inferred from the grammar, based on declarations of appropriate morpho-syntactic property sets (MS values). All it takes is to expand, prior to Pāṇinian inference, any leaf rule type by intersecting its MUD value with the value of the appropriateness function for MS. See Diaz et al. (2019) for an example of such a declaration. As a result, fully underspecified MUD values will be expanded into the minimal types appropriate for each dimension of the paradigm, yielding an expanded hierarchy of rule types as in Figure 7b that will give sound results under Pāṇinian competition.

21 Morphology

$$\begin{array}{ll} \text{(25)} & \text{a.} \begin{bmatrix} \text{MUD} \left\{ past \right\} \\ \begin{bmatrix} \text{PH} \left\{ \begin{bmatrix} \text{PH} \left\{ li \\ \text{pc} \ 3 \end{bmatrix} \right\} \end{bmatrix} \end{array} \end{array} \right] \\ & \begin{bmatrix} \text{MUD} \left\{ past \right\} \\ \begin{bmatrix} \text{MUD} \left\{ past \right\} \\ \text{MS} \quad \left\{ \begin{array}{l} \text{neg} \end{array} \right\} \cup set \\ \begin{bmatrix} \text{PH} \left\{ \begin{bmatrix} \text{PH} \left\{ tu \end{bmatrix} \right\} \end{bmatrix} \end{array} \right\} \end{array} \end{array}$$

We can provide rules for the two past markers as given in (25), where *ku-* is additionally conditioned on the presence of *neg* in the morphosyntactic property set (MS). While these two rules stand in Pāṇinian competition with each other, rule (25b) is crucially no competitor for the regular negative marker *ha-*, since the MUD sets of (25b) and (21a) are actually disjoint. Thus, by embracing a distinction between expression and conditioning, overlapping exponence behaves as expected with respect to Pāṇini's principle.

Pure multiple exponence works somewhat differently from overlapping exponence: in Nyanja (Stump 2001; Crysmann 2017a), class B adjectives, such as *kulu* 'large' in (26a) take two class markers to mark agreement with the head noun, one set of markers being the one normally used with class A adjectives, such as *bwino* 'good' in (26b), the other being attested with verbs, such as *kula* 'grow' (26c). Both sets distinguish the same properties, i.e. nominal class.<sup>12</sup>

	- b. ci-manga CL7-maize ca-bwino qUAL7-good 'good maize'
	- c. ci-lombo CL7-weed ci-kula. CONC7-grow 'A weed grows.'

Crysmann (2017a) shows that double inflection as in Nyanja can be captured by composing rules of exponence for verbs and type A adjectives to yield the complex rules for type B adjectives, as shown in Figure 8.

The difference in treatment for overlapping and pure multiple exponence of course raises the question whether or not the approaches should be harmonised.

<sup>12</sup>The examples in (26) are taken from Stump (2001: 6).

21 Morphology

The only way to do this would be to generalise the Nyanja case to overlapping exponence, by way of treating all such cases by means of composing rules. While possible in general, there is a clear downside to such a move: as we saw in the discussion of Swahili above, there is not only a dependency between negative and past tense, but also between negative *si-* and relative marking. As a result, one would end up organising negation, tense and relative marking into a single cross-cutting multi-dimensional type hierarchy. Inflectional allomorphy by contrast supports a much more modularised perspective which greatly simplifies specification of the grammar.

# **4.4 Morphotactics**

The treatment of morphotactically complex systems, as found in e.g. position class systems, was one of the major motivations behind the development of IbM. With the aim of providing a formal model of complex morph ordering that matches the parsimony of the traditional descriptive template, Crysmann & Bonami (2016) discarded the cascaded rule model adopted by e.g. PFM (Stump 2001).<sup>13</sup> Instead, order is directly represented as a property of exponents.

Taking as a starting point the classical challenges from Stump (1993) – portmanteau, ambifixal, reversed, and parallel position classes –, they developed an extended typology of variable morphotactics, i.e. systems, which depart from the kind of rigid ordering more commonly found in morphological systems.

Table 21.5: Masculine singular forms of the Nepali verb BIRSANU 'forget'


One of the most simple deviations from strict and invariable ordering is misaligned placement: while exponents that mark alternative values for the same feature and therefore stand in paradigmatic opposition tend to occur in the same position, this is not always the case, as illustrated by the example from Nepali in Table 21.5. While the agreement markers (in italics) follow the tense marker

<sup>13</sup>Crysmann & Bonami (2012) was a conservative extension of PFM with reified position class indices, an approach that was rendered obsolete by subsequent work.

#### Berthold Crysmann

(bold) in the present, the relative order of tense and agreement marker differs from cell to cell in the future (LOW and MID constitute different levels in the system of honorifics).

Figure 9: Nepali tense and agreement marking

If position class indices are part of the descriptive inventory, an account of apparently reversed position classes (Stump 1993) becomes almost trivial, as shown in Figure 9: all it takes is to assign the present marker an index that precedes all agreement markers and assign the future marker an index that precedes some agreement markers, but not others.

A slightly more complex case is conditioned placement: in contrast to misaligned placement, assignment of position does not just depend on the properties expressed by the marker itself, but on some additional property. An example of this is Swahili "ambifixal" relative marking, as shown in examples (27)–(28).<sup>14</sup> In the affirmative indefinite tense, the relative marker is realised in a position after the stem, whereas in all other cases it precedes it.

(27) a. a-soma-*ye* M/WA.SG-read-M/WA.SG.REL

'(person) who reads'

b. a-ki-soma-*cho* M/WA.SG-KI/VI.SG.O-read-KI/VI.SG.REL '(book) which he reads'

(Swahili)

<sup>14</sup>Conditioned placement is not only attested on alternate sides of the stem, as discussed for Swahili in Stump (1993), but also on the same side. See the discussion of mesoclisis in European Portuguese in Crysmann & Bonami (2016).

21 Morphology

(Swahili)

(28) a. a-na-*ye*-soma M/WA.SG-PRES-M/WA.SG.REL-read '(person) who is reading'

b. a-na-*cho*-ki-soma M/WA.SG-PRES-KI/VI.REL-KI/VI.SG.O-read '(book) which he is reading'

Conditioned placement can be captured using a two-dimensional hierarchy, as shown in Figure 10: the MORPHOTACTICS dimension on the left defines the conditions for the corresponding placement constraints, whereas the EXPONENCE dimension provides the constraints on the shape of the 16 relative class markers that undergo the alternation. Cross-classification by means of Online Type Construction finally distributes the morphotactic constraints over the rules of exponence.

Figure 10: Swahili relative markers

The last basic type of variable morphotactics is free placement, i.e. free permutation of a circumscribed number of markers. This is attested e.g. in Chintang (Bickel et al. 2007) and in Mari (Luutonen 1997).

While markers of core cases follow the possessive marker, and exponents of the lative cases precede it, the dative marker permits both relative orders. Free permutation appears to present a challenge for cascaded rule models, such as PFM, whereas an analysis is almost trivial in IbM, as position can be underspecified.

### **Relative placement**

Inflectional morphology does not provide much evidence for internal structure. This is recognised in IbM by representing morphs on a flat list with simple posi-

### Berthold Crysmann

ABSOLUTE 1PL POSSESSED POSS ≺ CASE CASE ≺ POSS NOM pört pört-**na** GEN pört-*ən* pört-**na**-*n* \* ACC pört-*əm* pört-**na**-*m* \* DAT pört-*lan* pört-**na**-*lan* pört-*lan*-**na** LAT pört-*eš* \* pört-*eš*-**na** ILL pört-*əš(kö)* \* pört-*əškə*-**na**

Table 21.6: Selected singular forms of the Mari noun *pört* 'house'

tion class indices. While a simple indexing by absolute position is often sufficient, there are cases where a more sophisticated indexing scheme is called for.

Crysmann & Bonami (2016) discuss placement of pronominal affix clusters in Italian. While placement is constant within the cluster of pronominal affixes itself, *me-lo-* in the example below, as well as between stem and tense and agreement affixes, the linearisation of the cluster as a whole is variable, as shown by the alternation between indicative and imperative in (29).


An important question raised by the Italian facts is whether morphotactics is in need of a more layered structure. If so, it will certainly not be the kind of structure provided by stem-centric cascaded rule approaches, like PFM, since it is the cluster that alternates between pre-stem and post-stem position, not the individual cluster members, which would yield mirroring.<sup>15</sup>

Crysmann & Bonami (2016) assume that it is the stem which is mobile in Italian and takes the exponents of tense and subject agreement along. To implement this, they show that it is sufficient to expose the positional index of the stem (the feature STM-PC in Figure 11), such that other markers can be placed relative to this pivot (cf. the agreement rule in Figure 12).

Compared to a layered structure, the pivot feature approach just described appears to be more versatile, since it provides a suitable solution to other cases

<sup>15</sup>See, however, Spencer (2005) for a variant of PFM that directly composes clusters.

21 Morphology

Figure 11: Partial hierarchy of Italian stem realisation rules

Figure 12: Partial hierarchy of Italian affixal realisation rules

of relative placement, such as second position affixes. Sorani Kurdish endoclitic agreement markers surface after the initial morph, be it the stem, or some prefixal marker (Samvelian 2007). Thus, placement is relative to whatever happens to be the first *instantiated* position index.

Bonami & Crysmann (2013) propose a pivot feature 1ST-PC that is instantiated to the position class index of the first element on the word's MPH list and exposed on all other morphs by the principle in (30).

$$\begin{array}{rcl} \text{(30)} & \text{word} \Longrightarrow \begin{bmatrix} \text{INF} \\ \text{INF} \end{bmatrix} \begin{bmatrix} \text{PC} & \begin{bmatrix} \text{L} \\ \text{1ST-PC} & \overline{\text{L}} \end{bmatrix}, \begin{bmatrix} \text{1ST-PC} & \overline{\text{L}} \\ \text{STM-PC} & \overline{\text{L}} \end{bmatrix}, \dots, \begin{bmatrix} \text{1ST-PC} & \overline{\text{L}} \\ \text{STM-PC} & \overline{\text{L}} \end{bmatrix} \end{array} \end{array}$$

### Berthold Crysmann


Table 21.7: Sorani Kurdish past person markers

What this principle does is distribute two critical position class indices over every element of the MPH list: one for the position of the stem, in order to capture stem-relative vs. absolute placement as in Italian, the other for the lowest instantiated position class index, to capture second position phenomena.

The realisation rule for a second position clitic can then be formulated as in (31), determining its PC value relative to that of the word's first morph. I use an arithmetic operator here as a convenient shortcut, but note that indices are actually represented as lists underlyingly (Bonami & Crysmann 2013).

$$\begin{aligned} \text{(31)} \quad & \begin{bmatrix} \text{MUD} \left\langle \begin{bmatrix} \text{PER} & \mathcal{I} \\ \text{NUM } pl \end{bmatrix} \right\rangle \\\ \text{(MPH} \left\langle \begin{bmatrix} \text{PH} & \langle \text{jân} \rangle \\ \text{1ST-PC} \begin{bmatrix} \Pi \\ \Pi \end{bmatrix} + 1 \end{bmatrix} \right\rangle \end{bmatrix} \end{aligned} $$

For illustration, consider the two word forms *nard=jân-im* 'they sent me' and *da=jân-nard-im* 'they were sending me' from Table 21.7. The first one consists of two positionally fixed morphs, the stem in position 5 and the person ending in position 7. According to (30), 1ST-PC will be token identical to the PC of *nard*, so *=jân* will be assigned a PC value of 6. The second example *da=jân-nard-im* has the additional progressive prefix *da-* in position 3, which is the lowest PC index of the word. Accordingly *=jân* is placed relative to the prefix *da-*, in position 4.

To conclude the section, a more general remark is in order: as we have seen, IbM uses explicit position indices to constrain morphotactic position. In essence, these correspond to linear distribution classes, where higher indices are realised to the right of lower indices and no two morphs within a word may bear the same index, resulting in competition for linear position. As a consequence, there is no static notion of a slot: while morphs are ordered according to indices, there

21 Morphology

is no requirement for indices to be consecutive. Thus, nothing much needs to be said about empty slots, except that there happens to be no morph in the word with that particular positional index.

# **4.5 Constructional vs. generative views**

IbM departs from previous, purely word-based approaches, such as Blevins (2016) or, within HPSG, Koenig (1999: Section 5.2.2) by recognising an intermediate level of realisation rules that effects the actual : relations between form and function. In this section, I shall discuss how this facilitates partial generalisations over gestalt exponence, provides for a better reuse of resources, as witnessed by parallel inflection, and finally ensures a modular organisation of rules of exponence.

### **4.5.1 Gestalt exponence**

One of the strongest arguments for the word-based view and against a generative rule-based approach comes from so-called gestalt exponence in Estonian (Blevins 2005). As shown in Table 21.8, core cases in this language give rise to case/number paradigms where (almost) all cells are properly distinguished by clearly segmentable markers, yet there is no straightforward association between the markers and the properties they express.

Table 21.8: Partial paradigms exemplifying three Estonian noun declensions (core cases; Blevins et al. 2016: 287)


#### Berthold Crysmann

The gestalt nature of Estonian case/number marking can be schematised as in Figure 13.

$$\bigotimes\_{\text{nok}}^{\text{beak'} \atop \text{nok}} \bigotimes\_{\text{-a} \atop \text{-de}}^{\text{me}}$$

Figure 13: *m*:*n* relations in Estonian

While it is clear that this kind of complex association between form and function requires a constructional perspective, it is far from evident that (i) this association has to be made at the level of the word rather than at the level of : rules and (ii) that this therefore requires word-to-word correspondences in the sense of Blevins (2005; 2016).<sup>16</sup> To the contrary, the system depicted in Table 21.8 displays partial generalisations that are hard to capture in a system such as Blevins': e.g. theme vowels are found in all cells except the nominative singular, only the nominative singular is monomorphic, all plural forms are tri-morphic, to name just a few.

In IbM, : correspondences are established at the level of realisation rules, and these realisation rules are organised into (cross-classifying) type hierarchies. Crysmann & Bonami (2017) argue that this makes it possible to extract the kind of partial generalisation noted in the previous paragraph and represent them in a three-dimensional type hierarchy that specifies constraints on stem selection independently of theme-vowel introduction and suffixation. Using pre-typing, idiosyncratic aspects can be contained, while more regular aspects, such as theme vowel and stem selection, are taken care of by Online Type Construction.

Furthermore, encapsulating gestalt exponence as a subsystem of realisation rules has the added advantage that it does not spill over into the rest of the Estonian inflection system, which, as a Finno-Ugric language, is highly agglutinative.

Composing complex pairings of morphological forms and functions by means of cross-classification of partial rule descriptions is not only beneficial to the treatment of gestalt exponence, but also lends itself more generally to capturing syntagmatic dependencies between exponents: see Crysmann (2021) for dependent agreement markers in Batsbi, and Crysmann (2020) for discontinuous morphotactic dependencies in Yimas.

While it is straightforward to implement constructional analyses within IbM, involving complex : relations between form and function, non-constructional analyses are actually preferred whenever possible, generally yielding much more parsimonious descriptions.

<sup>16</sup>See also Guzmán Naranjo (2019) for a formalisation of word-based morphology in HPSG.

21 Morphology

### **4.5.2 Reuse of resources**

Reuse of resources constitutes a particularly strong argument against over-generalising to the constructional, or word-based, view: parallel position classes are a case at hand, as exemplified in Swahili (Stump 1993; Crysmann & Bonami 2016) or Choctaw (Broadwell 2017).

Table 21.9: Swahili person markers (Stump 1993: 143)


Consider the paradigms of Swahili subject and object agreement markers in Table 21.9: as one can easily establish, agreement markers draw largely on the same set of shapes. Grammatical function is disambiguated mainly by position, with subject agreement placed to the left of tense markers, and object agreement to the right.

Under a constructional approach, such as the word-based analysis in Koenig (1999), the generalisation about identity of shapes is essentially lost, which is due to the fact that under this view, markers that can potentially combine must be introduced in different cross-classifying dimensions, e.g. one for subject marking in slot 2, the other for object marking in slot 5. Likewise, in order to distribute shape constraints over subject and object agreement, they must constitute yet another cross-cutting dimension, but there is simply no way in this set-up to enforce that every shape constraint must be evaluated twice.

However, once we move from word-based statements to realisation rules, the problem simply vanishes, since we are not trying to solve the problems of parallelism of exponence and combination at the same time. As illustrated in Figure 14, constraints about shape can be straightforwardly distributed over realisation rules for subject and object agreement (which are types), because their

### Berthold Crysmann

Figure 14: Rule type hierarchy for Swahili parallel position classes (Crysmann & Bonami 2016: 356)

combination is effectively factored out. Thus, by abstracting over rules instead of words, generalisation regarding parallel sets of exponents can be captured quite easily. Sharing of resources is in fact a more general problem that tends to get overlooked by radically word-based approaches such as Blevins (2016).

### **4.5.3 Modularity**

The final argument for combining constructional or holistic with generative or atomistic views is that it provides for a divide and conquer approach to complex inflectional systems.

Diaz et al. (2019) discuss the pre-pronominal affix cluster in Oneida, an Iroquoian language. Oneida presents us with what is probably the most complex morphotactic system that has been described so far within IbM.

Oneida is a highly polysynthetic language. According to Diaz et al. (2019), the prefixal inflectional system alone comprises seven position classes in which up to eight non-modal and three modal categories can be expressed (cf. Table 21.10). Given the number of categories and positions alone, it comes at no surprise that the system is characterised by heavy competition. Adding to the complexity, several markers undergo complex interactions, even between non-adjacent slots. Finally, Oneida pre-pronominal prefixes also display variable morphotactics: the

### 21 Morphology

factual, for instance, appears in four different surface positions, and the optative in three. Moreover, we find paradigmatic misalignment (cf. the discussion of Nepali above), with the cislocative in a different surface position from the translocative.


Table 21.10: Position classes of Oneida inflectional prefixes (Diaz et al. 2019: 435)

Diaz et al. (2019) discuss three different types of interaction within the system: (i) positional competition, exhibited in slot 1 (negative, contrastive, coincidental, partitive) and slot 5 (cislocative, repetitive); (ii) borrowing, a particular case of extended exponence exhibited in slot 2 (translocative borrowing vowels from the future and factual); and (iii) sharing, witnessed by the factual and the optative, which are distributed across different positions. Cross-cutting these subsystems, we find a great level of contextual inflectional allomorphy.

Diaz et al. (2019) contain the complexity of the system by building on several key notions, the first three of which are integral parts of IbM: first, the fact that IbM recognises : relations at the rule level make it possible to approach the Oneida system in a more modular fashion, carving out four independent subsystems for competition (slot 1 and slot 5), borrowing (slot 2), and sharing (factual). Second, they draw on the distinction between realisation (MUD) and conditioning MS to abstract out inflectional allomorphy. Third, they capture discontinuous exponence of the factual and optative in terms of Koenig/Jurafsky style cross-classification in order to derive complex discontinuous rules.

The two innovative aspects of their analysis concern the treatment of competition and an abstraction over morphosyntactic properties in terms of syntagmatic classes. Oneida resolves morphotactic competition of semantically compatible features (slots 1 and 5) by means of a markedness hierarchy: features that are outranked on this hierarchy are optionally interpreted if the exponent of a higher feature is present. For example the negative outranks the partitive, so if the negative marker is present, it can be interpreted as negative or negative and partitive. If, by contrast, the partitive marker is found, the negative cannot be understood. Diaz et al. (2019) approach this by modelling the ranking in terms of a type hierarchy upon which realisation rules can draw. Their second innovation, i.e. the

### Berthold Crysmann

segregation of morphosemantic properties according to the positional properties of their exponents into e.g. inner or outer types, has enabled them to give a much more concise representation of allomorphy that can abstract over strata of positions.

The combination of design properties of IbM with their two innovations have permitted Diaz et al. (2019) to provide an explicit and surprisingly concise analysis of an extremely complex system: in essence, their highly modular analysis (with only 36 rules) reduces the number of allomorphs by a factor of ten.

In sum, having : relations at the most basic level of realisation rules means that constructional views can be implemented at any level of granularity, combining reuse and recombination, as favoured by an atomistic (generative) view, with the holistic (constructional) view necessitated by discontinuous or gestalt exponence. To quote Diaz et al. (2019), "IbM's approach to morphology [...] is something unification-based approaches to syntax have stressed for the last fortyyears or so". In addition to the model-theoretic aspect they capitalise on, the similarity of IbM to current HPSG syntax also pertains to the fact that both integrate lexicalist and constructional views.

# **5 Conclusion**

This chapter has provided an overview of HPSG work in two core areas of morphology, namely derivation and inflection. The focus of this paper was biased to some degree towards inflection, for two reasons: on the one hand, a handbook article that provides a more balanced representation of derivational and inflectional work in constraint-based grammar was published quite recently (Bonami & Crysmann 2016), while on the other, a comprehensive introduction to recent developments within HPSG inflectional morphology was still missing.

In the area of derivation and grammatical function change, a consensus was reached relatively early, toward the end of the last century, with the works of Riehemann (1998) and Koenig 1999: within HPSG, it is now clearly understood that lexical rules are description-level devices organised into cross-cutting inheritance type hierarchies. One of the distinctive advantages of these approaches is the possibility to capture regular, subregular, and irregular formations using a single unified formal framework, namely partial descriptions of typed feature structures. Beyond HPSG, these works have influenced the development of Construction Morphology (Booij 2010).<sup>17</sup>

<sup>17</sup>See Müller (2021b), Chapter 32 of this volume for a comparison of HPSG with Construction

21 Morphology

Much more recently, a consensus model seems to have arrived for the treatment of inflectional morphology. Information-based Morphology (Crysmann & Bonami 2016; Crysmann 2017a) builds on previous work on inflectional morphology in HPSG (Bonami), Online Type Construction (Koenig 1999), morphbased morphology (Crysmann 2003), and finally unification-based approaches to Pāṇini's principle (Andrews 1990; Erjavec 1994; Koenig 1999) to provide an inferential-realisational theory of morphology that exploits the same logic as HPSG, namely typed feature structure inheritance networks to capture linguistic generalisations. Furthermore, like its syntactic parent, it permits to strike a balance between lexicalist and constructional views. By recognising : relations between function and form at the most basic level, i.e. realisation rules, morphological generalisations are uniformly captured in terms of partial rule descriptions.

# **Acknowledgements**

I am greatly indebted to Andrew Spencer and Olivier Bonami for their comments and remarks on this chapter. A great many thanks also go to the editors of the handbook, in particular to Jean-Pierre Koenig for their helpful suggestions. Finally, I also would like to express my gratitude to the typesetters who helped to make the rendering of complex hierarchies much more consistent and visually appealing.

This work has been carried out in the context of the excellency cluster *Empirical Foundations of Linguistics*, which is supported by a public grant overseen by the French National Research Agency (ANR) as part of the program "Investissements d'Avenir" (reference: ANR-10-LABX-0083). It contributes to the IdEx Université de Paris - ANR-18-IDEX-0001.

# **References**


Grammar.

### Berthold Crysmann


21 Morphology

http : / / csli - publications . stanford . edu / HPSG / 2006 / bonami - boye . pdf (26 November, 2020).


### Berthold Crysmann


21 Morphology


### Berthold Crysmann


21 Morphology

Processing), 90–136. A version of this paper is available as DFKI Research Report RR-91-31. Also published in: Proceedings of the ACQUILEX Workshop on Default Inheritance in the Lexicon, Technical Report No. 238, University of Cambridge, Computer Laboratory, October 1991. Cambridge, UK: Cambridge University Press. DOI: 10.22028/D291-24827.


### Berthold Crysmann


21 Morphology


# **Chapter 22**

# **Semantics**

Jean-Pierre Koenig University at Buffalo

# Frank Richter

Goethe Universität Frankfurt

This chapter discusses the integration of theories of semantic representations into HPSG. It focuses on those aspects that are specific to HPSG and, in particular, recent approaches that make use of underspecified semantic representations, as they are quite unique to HPSG.

# **1 Introduction**

A semantic level of description is more integrated into the architecture of HPSG than in many frameworks (although, in the last couple of decades, the integration of syntax and semantics has become tighter overall; see Heim & Kratzer 1998 for Mainstream Generative Grammar<sup>1</sup> , for example). Every node in a syntactic tree includes all appropriate levels of structure, phonology, syntax, semantics, and pragmatics so that *local* interaction between all these levels is in principle possible within the HPSG architecture. The architecture of HPSG thus follows the spirit of the rule-to-rule approach advocated in Bach (1976) and more specifically Klein & Sag (1985) to have every syntactic operation matched by a semantic operation (the latter, of course, follows the Categorial Grammar lead, broadly speaking; Ajdukiewicz 1935; Pollard 1984; Steedman 2000). But, as we shall see, only the spirit of the rule-to-rule approach is adhered to, as there can be more

<sup>1</sup>We follow Culicover & Jackendoff (2005: 3) in using the term *Mainstream Generative Grammar* (MGG) to refer to work in Government & Binding or Minimalism.

Jean-Pierre Koenig & Frank Richter. 2021. Semantics. In Stefan Müller, Anne Abeillé, Robert D. Borsley & Jean- Pierre Koenig (eds.), *Head-Driven Phrase Structure Grammar: The handbook*, 1001–1042. Berlin: Language Science Press. DOI: 10.5281/zenodo.5599862

#### Jean-Pierre Koenig & Frank Richter

than one semantic operation per class of syntactic structures, depending on the semantic properties of the expressions that are syntactically composed. The builtin interaction between syntax and semantics within HPSG is evidenced by the fact that Pollard & Sag (1987), the first book-length introduction to HPSG, spends a fair amount of time on semantics and ontological issues, much more than was customary in syntax-oriented books at the time.

But despite the centrality of semantics within the HPSG architecture, not much comprehensive work on the interface between syntax and semantics was done until the late 90s, if we exclude work on the association of semantic arguments to syntactic valents in the early 90s (see Davis, Koenig & Wechsler 2021, Chapter 9 of this volume). The formal architecture was ripe for research on the interface between syntax and semantics, but comparatively few scholars stepped in. Early work on semantics in HPSG investigated scoping issues, as HPSG surfaceoriented syntax presents interesting challenges when modeling alternative scope relations. This is what Pollard & Sag (1987; 1994) focus on most. Scope of modifiers is also an area that was of importance and received attention for the same reason both in Pollard & Sag (1994) and Kasper (1997). Ginzburg & Sag (2000) is the first study not devoted to argument structure to leverage the syntactic architecture of HPSG to model the semantics of a particular area of grammar, in this case interrogatives.

The real innovation HPSG brought to the interface between syntax and semantics is the use of underspecification in Minimal Recursion Semantics (Copestake, Flickinger, Malouf, Riehemann & Sag 1995; Egg 1998; Copestake, Lascarides & Flickinger 2001; Copestake, Flickinger, Pollard & Sag 2005) and Lexical Resource Semantics (Richter & Sailer 2004b); see also Nerbonne (1993) for an early use of scope underspecification in HPSG. The critical distinction between grammars as descriptions of admissible structures and models of these descriptions makes it possible to have a new way of thinking about the meaning contributions of lexical entries and constructional entries: underspecification is the other side of descriptions.

# **2 A situation semantics beginning**

The semantic side of HPSG was initially rooted in Situation Semantics (Pollard & Sag 1987: Chapter 4, on Situation Semantics see Barwise & Perry 1983). The choice of Situation Semantics is probably somewhat a matter of happenstance, and overall, nothing too crucial depended on that choice (and other choices have been explored since, as we detail below). However, this statement should not be construed as implying the choice was inconsequential. There were several

### 22 Semantics

interesting aspects of this choice for the study of the interface between syntax and semantics that is integral to any grammatical framework. We briefly mention a few here. A first interesting aspect of this choice is that the identification of arguments was not through an ordering but via keywords standing for role names, something that made it easier to model argument structure in subsequent work (see Davis, Koenig & Wechsler 2021, Chapter 9 of this volume). A second aspect is the built-in "intensionality" of Situation Semantics. Since atomic formulas in Situation Semantics denote circumstances rather than truth values, and circumstances are more finely individuated than truth values, the need to resort to possible world semantics to properly characterize differences in the meaning of basic verbs, for example, is avoided. A third aspect of Situation Semantics that played an important role in HPSG is parameters. Parameters are variables that can be restricted and sorted, thus allowing for an easy semantic classification of types of NPs, something that HPSG's Binding Theory makes use of (Müller 2021a: Section 2, Chapter 20 of this volume).

Parameters also play an important role in early accounts of quantification; these accounts rely on restrictions on parameters that constrain how variables are anchored, akin to predicative conditions on discourse referents in Discourse Representation Theory (Kamp & Reyle 1993). Restrictions on parameters are illustrated with (1), the (non-empty) semantic content of the common noun *donkey*, where the variable 1 is restricted to individuals that are donkeys, as expressed by the value of the attribute REST. 2

$$\begin{array}{c} \text{(1)} \quad \begin{bmatrix} \text{VAR} & \box{\text{I}} \\ \text{REST} & \begin{bmatrix} \text{RELN} \ dondey \\ \text{INST} & \box{\text{II}} \end{bmatrix} \end{array} \end{array}$$

Because indices are restricted variables/parameters, the model of quantification proposed in Pollard & Sag (1987: Chapter 4) involves restricted quantifiers. Consider the sentence *Every donkey sneezes* and its semantic representation in (2) (Pollard & Sag 1987: 109).

<sup>2</sup> In Pollard & Sag (1987), which we discuss here, semantic relations are the values of a RELN attribute and restrictions are single semantic relations rather than set-valued. To ensure historical accuracy, we use the feature geometry that was used at the time.

#### Jean-Pierre Koenig & Frank Richter

The subject NP contributes the value of the attribute qUANT, while the verb contributes the value of SCOPE. The quantifier includes information on the type of quantifier contributed by the determiner (a universal quantifier in this case) and the index (a parameter restricted by the common noun).

Because HPSG is a sign-based grammar, each constituent includes a phonological and semantic component as well as a syntactic level of representation (along with other possible levels, e.g. information structure; see De Kuthy 2021, Chapter 23 of this volume). Compositionality has thus always been directly incorporated by principles that regulate the value of the mother's SEM attribute, given the SEM values of the daughters and their mode of syntactic combination (as manifested by their syntactic properties). Different approaches to semantics within HPSG propose variants of a Semantics Principle that constrains this relation. The Semantics Principle of Pollard & Sag (1987: 109) is stated in English in (3) (we assume for simplicity that there is a single complement daughter; Pollard & Sag define semantic composition recursively for cases of multiple complement daughters).

	- b. Otherwise, the semantic contents of the head daughter and the mother are identical.

The fact that the Semantics Principle in (3) receives a case-based definition is of note. Since HPSG is monostratal, there is only one stratum of representation (see Ladusaw 1982 for the difference between levels and strata). But the semantic contribution of complement daughters varies. Some complement daughters are proper names or pronouns, while others are generalized quantifiers, for example. Since it is assumed that the way in which the meaning of (free) pronouns or proper names combines with the meaning of verbs differs from the way generalized quantifiers combine with the meaning of verbs, the Semantics Principle must receive a case-based definition. In other words, syntactic combinatorics is less varied than semantic combinatorics. The standard way of avoiding violations of compositionality (the fact that semantic composition is a *function*) is to have a case-based definition of the semantic effect of combining a head daughter with its complements, a point already made in Partee (1984). As (3) shows, HPSG has followed this practice since its beginning. The reason is clear: one

#### 22 Semantics

cannot maintain a surface-oriented approach to syntax, where syntax is "simpler", to borrow a phrase from Culicover & Jackendoff (2005), without resorting to case-based definitions of the semantic import of syntactic combinatorics.

# **3 Scope relations in HPSG**

In Mainstream Generative Grammar, there is an assumption that syntactic constituency reflects semantic constituency at one stratum of representation. In the case of quantifier scope in works like May (1985), this means that quantified expressions are moved out of their surface position and raised to a position where they can receive their proper scope through Quantifier Raising (and/or Quantifier Lowering; see, among others Hornstein 1995).<sup>3</sup> . Of course, such a move requires multiple strata, as there is little evidence that quantifier scope affects surface syntactic structure. The Semantics Principle and the representation of quantifier meanings outlined in Pollard & Sag (1987) and briefly presented in the previous section was not flexible enough to model the relation between single syntactic structures and multiple scopal relations. As Pollard & Sag explicitly recognized (p. 112), their Semantics Principle only models left-to-right scopal relations, i.e. quantifiers that are expressed by a complement (or subject) that is to the left of another complement have wide scope with respect to that quantifier. So-called inverse scope, including the fact that quantifiers in object position can outscope quantifiers in subject position, cannot be modeled by the kind of Semantics Principle they propose. Much of the discussion of semantics within HPSG in the 90s pertains to improving how scope is modeled, both the scope of quantifiers and the scope of adjuncts. We discuss each in turn in this section.

# **3.1 Quantifier scope**

Until the mid-2000s, HPSG's "standard" model of the interface between the syntax and semantics of phrases that contain quantifiers adapted to HPSG the approach proposed in Cooper (1975; 1983), i.e. so-called Cooper storage: when a quantified expression combines with another expression, the quantifier is put in a store, and various scopal relations correspond to the various nodes at which the quantifier can be retrieved from storage. Within HPSG, quantifier storage involves a QSTORE attribute where each quantifier starts, and at each node, quantifiers are either retrieved (part of the RETRIEVED list) or continue to be on the

<sup>3</sup>For a discussion of the relation between the semantic scope of aspect markers and the syntactic structures they enter in in Mainstream Generative Grammar vs. HPSG, see Koenig & Muansuwan (2005)

mother's QSTORE. The relative scope of quantifiers itself is determined by the ordering of quantifiers on the qUANTS list. The simplified tree in Figure 1 from Pollard & Sag (1994: 324) illustrates the inverse scope reading of an English sentence containing two quantifiers.

Figure 1: Semantic composition of an English sentence containing two quantifiers

Both subject and object quantifiers start with their quantifiers (basically, something very similar to the representation in (2)) in a QSTORE. Since the reading of interest is the one where *a poem* outscopes *every student*, the quantifier introduced by *a poem* cannot be retrieved at the VP level. This is because the value of qUANTS is the concatenation of the value of RETRIEVED with the qUANTS value of the head daughter. Were the quantifier introduced by *a poem* ( 3 ) retrieved at the VP level, the sole quantifier retrieved at the S level, the one introduced by *every student*, would outscope it. So, the only way for the quantifier introduced by *a poem* to outscope the quantifier introduced by *every student* is for the former to be retrieved at the S node just like the latter. Simplifying somewhat for presentational purposes, two principles govern how quantifiers are passed on from head daughter to mothers and how quantifier scope is assigned for retrieved quantifiers; they are stated in (4) (adapted from Pollard & Sag 1994: 322–323).

#### 22 Semantics

	- b. In a headed phrase (of sort *psoa*, which stands for "parameterized state of affairs"), the qUANTS value is the concatenation of the RETRIEVED value and the qUANTS value of the semantic head.

(4a) ensures that quantifiers in storage are passed up the tree, except for those that are retrieved; (4b) ensures that quantifiers that are retrieved outscope quantifiers that were retrieved lower in the tree. Retrieval at the VP level entails narrow scope of quantifiers that occur in object position; wide scope of quantifiers that occur in object position entails retrieval at the S level. But retrieval at the S level of quantifiers that occur in object position does not entail wide scope, as the order of two quantifiers in the same RETRIEVED list (i.e. retrieved at the same node) is unconstrained. Constraints on quantifier retrieval and scope underdetermines quantifier scope. To ensure that quantifiers are retrieved sufficiently "high" in the tree to bind bound variable uses of pronouns, e.g. *her* in (5), Pollard & Sag propose the constraint in (6).


The use of Cooper storage allows for a syntactically parsimonious treatment of quantifier scope ambiguities in that no syntactic ambiguity needs to be posited to account for what is a strictly semantic phenomenon. But as Pollard & Sag note (p. 328), their model of quantifier scope does not account for the possible narrow scope interpretation of the quantifier *a unicorn* in (7) (the interpretation according to which the speaker does not commit to the existence of unicorns). Raised arguments only occur once, in their surface position, and (4a–b) ensure that quantifiers are never retrieved "lower" than their surface position.

(7) A unicorn appears to be approaching.

Pollard & Yoo (1998) tackle that problem, as well as take into account the fact that a sentence such as (8) is ambiguous (i.e. the quantifier *five books* can have wide or narrow scope with respect to the meaning of *believe*).

(8) Five books, I believe John read. (ambiguous)

#### Jean-Pierre Koenig & Frank Richter

As Pollard & Yoo note, since quantifier storage and retrieval is a property of signs, and fillers (see Borsley & Crysmann 2021, Chapter 13 of this volume) only share their LOCAL attribute values with arguments of the head (*read* in (8)), the narrow scope reading cannot be accounted for. (7) and (8), among other similar examples, illustrate some of the complexities of combining a surface-oriented approach to syntax with a descriptively adequate model of semantic composition.

Pollard & Yoo's solution (p. 419–420) amounts to making quantifier storage and retrieval a property of the LOCAL value and to restricting quantifier retrieval to semantically potent heads (so, the *to* of infinitive VPs cannot be a site for quantifier retrieval). The new feature geometry of *sign* that Pollard and Yoo propose is represented in (9). The pool of quantifiers collects the quantifiers on the QS-TORE of its selected arguments (members of the SUBJ, COMPS, and SPR lists, and the value of MOD, except for quantifying determiners and semantically vacuous heads like *to* or *be*) and the constraints in (10) and (11) (Pollard & Yoo 1998: 423) ensure proper percolation of quantifier store values within headed phrases as well as the semantic order of retrieved quantifiers.

$$\begin{array}{c} \text{(9)} \quad \begin{bmatrix} \text{sign} \\ \text{PHONOLOGY} \ \text{list(} (\text{phon\\_string}) \\ \text{SYNEM} \\ \text{SYNEM} \\ \text{I} \end{bmatrix} \begin{bmatrix} \text{CATEORRY} \ \text{category} \\ \text{CONTINET} \\ \text{QSTROE} \\ \text{POOL} \\ \text{POOL} \end{bmatrix} \begin{bmatrix} \text{return} \\ \text{set(quantifier)} \\ \text{POOL} \\ \text{Set(quantifier)} \end{bmatrix} \end{array}$$
 
$$\begin{array}{c} \text{(10)} \quad \begin{bmatrix} \text{SYNEM} \ \text{LOC} \ \begin{bmatrix} \text{STORE} \\ \text{POOL} \\ \text{POOL} \end{bmatrix} \\ \text{RETRIFED} \\ \text{\\_Aset-of-lelements \ \text{([2]} \ \text{,[2]} \ \text{(}\ \Box\end{bmatrix} \subseteq \begin{bmatrix} \Box\end{bmatrix} \ \Box\end{bmatrix} \begin{bmatrix} \text{2} \ \text{(}\ \Box\end{bmatrix} = \begin{bmatrix} \text{2} \ \Box\end{bmatrix} \end{array}$$

	- b. For a semantically nonvacuous lexical head, the qUANTS value is tokenidentical with the RETRIEVED value.

What remains in the QSTORE of a sign is the quantifiers that were in the POOL of (unretrieved) quantifiers minus quantifiers that were retrieved, according to the constraint in (10). Since the POOL of the sign's mother is the QSTORE of its head

22 Semantics

daughter as per constraint (11a), a quantifier retrieved on a head daughter is not part of the POOL of (unretrieved) quantifiers of the mother.

In a follow-up paper, Przepiórkowski (1998) proposed a strictly lexicalized retrieval mechanism which removes structural ambiguities arising from different possible retrieval sites for quantifiers along a syntactic head path, is compatible with trace-based and traceless analyses of extraction (Pollard & Yoo's analysis only covers trace-based extraction), and shifts all semantic structure under the CONTENT attribute.

# **3.2 Adjunct scope**

HPSG phrase structure schemata are built, for a significant part, around headed structures. In the case of the head-complement or head-subject schemata, syntactic headedness and semantic headedness match. The verb is the head of VPs and clauses, and the circumstance or state of affairs denoted by verbs typically takes as arguments the indices of its complements or subjects, and more generally, part of the CONTENT value of the verb takes as arguments part of the CONTENT value of its dependents. But in the case of head-adjunct structures, syntactic and semantic headedness do not match. The denotation of adjuncts often takes the denotation of heads as arguments. Thus, in (12), fastness is ascribed to Bob's running. Accordingly, the Semantics Principle distinguishes between head-adjunct structures and other structures, as shown in (13) (Pollard & Sag 1994: 56). (The principle we cite does not consider the quantifier retrieval we discussed in the previous section.)


Unfortunately, the hypothesis that the content of phrases "projects" from the adjunct in the case of head-adjunct structures leads to difficulties in the case of so-called recursive modification, e.g. (14), as Kasper (1997) shows.

(14) a potentially controversial plan

The NP in (14) denotes an existential quantifier whose restriction is a plan that is potentially controversial; intuitively speaking, what is potential is the controversiality of the plan, not it being a plan. But the Semantics Principle, the syntactic

#### Jean-Pierre Koenig & Frank Richter

selection of modified expressions by modifiers, and lexical entries for intersective and non-intersective adjectives conspire to lead to the wrong meaning for recursive modification of the kind (14) illustrates: since *controversial* selects for *plan*, combining their meaning leads to the meaning represented in (15), as *controversial* is an intersective adjective.

$$\begin{array}{c} \begin{array}{c} \text{nom-obj} \\ \text{INDEX []} \\ \text{RESTR} \begin{array}{c} \text{RELN} \\ \text{INST} \end{array} \begin{array}{c} \text{lenLN} \\ \text{RELN} \end{array} \begin{array}{c} \text{lenLN} \\ \text{RELN} \end{array} \begin{array}{c} \text{controversial} \\ \text{RGR} \end{array} \end{array} \end{array} \end{array} \left| \begin{array}{c} \text{RELN} \\ \text{RELN} \end{array} \begin{array}{c} \text{controversial} \\ \text{RGR} \end{array} \right| \right| \right| $$

But since adjuncts are the semantic head, the meaning of *potentially controversial plan* will be projected from the meaning of *potentially*, the most deeply embedded adjunct. Now, *potentially* is a conjectural adverb, to adapt to adverbs the classification of adjectives proposed by Keenan & Faltz (1985: 125). Within HPSG, this means that the meaning of *potentially* is a function that takes the meaning of what it modifies as argument, i.e. the meaning represented in (15). But this leads to the meaning represented in (16), which is the wrong semantics, as a potentially controversial plan is not a potential plan, as Kasper (1997: 10–11) points out.

$$\begin{array}{c} \text{(16)} \quad \begin{bmatrix} \text{non-obj} \\ \text{INEX} \,\boxed{\text{II}} \\ \text{RESTR} \begin{bmatrix} \text{RELN} \,\text{potential} \\ \text{ARG} \begin{bmatrix} \text{RELN} \,\text{plan} \\ \text{INST} \,\boxed{\text{II}} \end{bmatrix} \& \begin{bmatrix} \text{RELN} \,\text{controversial} \\ \text{ARG} \,\boxed{\text{II}} \end{bmatrix} \end{array} \end{array}$$

The problem with Pollard & Sag's Semantics Principle, when it comes to recursive modification, is clear: semantic selection follows an adjunct path, so to speak, so the most deeply embedded adjunct will have widest scope.

Kasper's solution is to distinguish the inherent meaning of an expression (its regular content) from meanings it may have in a particular construction: its combinatorial semantics (its internal and external content). With respect to prenominal adjuncts, the internal content corresponds to the content of the adjunct's maximal projection, whereas the external content corresponds to the content of the combination of the adjunct's meaning with what it modifies. The Semantics Principle is revised to reflect the distinction between internal and external contents and is provided in (17) (Kasper 1997: 19).

	- b. For all other headed phrases, the CONT value is token-identical to the CONT value of the head daughter.

22 Semantics

The result of applying the revised Semantics Principle to *potentially controversial* is provided in Figure 2 and the semantics of *controversial* and *potentially* are provided in (18) and (19), respectively. (The value of ARG in Kasper's analysis of *controversial* corresponds to the syntactic and semantic properties of the modified constituent. The feature ARG value is thus the equivalent of the SYNSEM value in current HPSG.)

Figure 2: Kasper's analysis of *potentially controversial*

#### Jean-Pierre Koenig & Frank Richter

Critically, each kind of modifier specifies in its MOD|ECONT value the combinatorial effects it has on the meaning of the modifier and modified combination; the ECONT value contains that result and will be inherited as CONT value by the mother node. Intersective adjectives like *controversial* specify that their combinatorial effect is intersective, as shown in (18); conjectural adverbs like *potentially*, on the other hand, specify their CONT value as the result of applying their meaning to the CONT value of the modified sign. As shown in the left daughter of Figure 2, the resulting CONT value of *potentially* is identified by (17a) with its ICONT value (which is in turn lexically specified as identical with the ECONT value) when *potentially* combines as adjunct daughter with an intersective adjective such as *controversial* in a head-adjunct structure. Moreover, when the depicted phrase *potentially controversial* combines with a noun such as *plan* in another head-adjunct phrase, 5 and 6 in Figure 2 become identical, again by (17a), thereby also integrating the meaning of *potentially controversial* into the second conjunct of the ECONT|RESTR value of the depicted phrase. Now, since the MOD value of the head in a head-adjunct phrase determines the MOD value of the phrase, it means that *controversial* determines in its ECONT, in combination with its ARG value, what it modifies (*plan*) and, ultimately, the CONT value of the entire phrase *potentially controversial plan*, thus ensuring that its intersectivity is preserved even when it combines with a conjectural adverb. Asudeh & Crouch (2002) and Egg (2004) provide more recent solutions to the same problem through the use of a Glue Semantics approach to meaning composition within HPSG and semantic underspecification, respectively. On Glue Semantics in LFG see also Wechsler & Asudeh (2021: Section 12.2 and 12.3), Chapter 30 of this volume.

# **4 Sorting semantic objects**

One of the hallmarks of HPSG is that all grammatical objects are assigned a sort (see the chapters Abeillé & Borsley 2021: Section 3 and Richter 2021: Section 2 for details). This includes semantic objects. Sorting of semantic objects has been used profitably in models of lexical knowledge, in particular in models of argument structure phenomena. We refer the reader to Davis, Koenig & Wechsler (2021), Chapter 9 of this volume for details about argument structure and only provide an illustrative example here. Consider the constraint in (20) from Koenig & Davis (2003: 231). It says that all verbs that denote a causal change of state, i.e. verbs whose CONTENT values are of sort *cause-rel*, link their causer argument to an NP that is the first member of the ARG-ST list.

22 Semantics

$$\begin{aligned} \text{(20)} \quad & \begin{bmatrix} \text{CNOTST} & \begin{bmatrix} \text{case-rel} \\ \text{CAUSE} \,\boxed{\text{L}} \end{bmatrix} \\ \text{^{[ARG-ST}} & \begin{Bmatrix} \text{NP}, \dots \end{Bmatrix} \end{bmatrix} \Rightarrow \begin{bmatrix} \text{ARG-ST} & \begin{Bmatrix} \text{NP} \boxed{\text{D}}, \dots \end{Bmatrix} \end{bmatrix} \end{aligned} $$

Critically, verbs like *frighten*, *kill*, and *calm* have as meanings a relation that is a subsort of *cause-rel* and are therefore subject to this constraint. Sorting lexical semantic relations thus makes for a compact statement of linking constraints. (The chapter on argument structure provides many more instances of the usefulness of sorting semantic relations, a hallmark of HPSG semantics.)

Constructional analyses that flourished in the late 1990s also benefited from the sorting of semantic objects. The analysis of clause types in Sag (1997) and Ginzburg & Sag (2000) makes extensive use of the sorting of semantic objects to model different kinds of clauses, as our discussion of the latter in the next section makes clear.

# **5 The advantages of a surface-oriented grammar**

Until now, we have mostly covered how semantic composition works in an approach where each node in a tree is associated with a meaning and where there is only one stratum and therefore the "location" of an expression in a syntactic tree does not necessarily correspond to where its meaning is composed: direct object quantifiers, for example, are syntactic sisters of the verb, even when they have wide scope over a quantifier in subject position. Although important as a proof that semantic composition can be modeled in a surface-oriented grammar, it is fair to say that HPSG work until the late 1990s does not have too much new insight to contribute to our understanding of the interface between syntax and semantics. This is in no way a slight of that early research on the interface between syntax and semantics. Demonstrating that you can "get things right" without multiple strata is important, and work on the relation between lexical meaning and argument structure (see Davis, Koenig & Wechsler 2021, Chapter 9 of this volume) is also important in showing that simplicity of syntactic representation does not come at the cost of adequacy. The message was good news: you do not need to make your syntax more complex in order to interface it with semantics. Of course, that was Montague's point already in the late 1960s and early 1970s (see the collected papers in Montague 1974), but that work was more a proof of concept. Carrying out what is basically the Montagovian agenda with a large scale grammar is more difficult, and this is what early work in HPSG, at least retrospectively, seems to have focused on.

#### Jean-Pierre Koenig & Frank Richter

The development of a more constructional HPSG in the mid-90s opened up new possibilities for modeling the interface between syntax and semantics. One of these possibilities is to organize families of informationally rich phrasal constructions into a multi-dimensional inheritance hierarchy so as to model the shared semantic combinatorics of quite distinct constructional patterns. This is, for example, apparent in Sag (1997), where a single modification meaning is assigned to a family of relative constructions that differ markedly syntactically. This is also what Ginzburg & Sag (2000) show with their analysis of interrogatives. But their analysis goes further in demonstrating that there may be advantages to a surface-oriented approach to syntax in that it correctly predicts an effect of the *surface* syntax onto semantics for the interpretation of interrogatives, as we now show.

The approach to interrogatives that Ginzburg & Sag propose is new in that it does not rely on the traditional Hamblin semantics for questions, namely that the meaning of questions is the set of (exhaustive) answers; see Hamblin (1973) and Groenendijk & Stokhof (1997). Rather, the meaning of questions consists of propositional abstracts (not sets of propositions). Parameters of the kind that have been part of HPSG approaches to semantics since the beginning are used to model these propositional abstracts. Because the meaning of questions consists of propositional abstracts, the meaning of *wh*-phrases is not the same as that of generalized quantifiers either; rather, *wh*-phrases introduce a parameter (roughly, the equivalent of a lambda-abstracted variable). (21a) and (21b) provide examples of the meaning of *wh*-questions and polar questions, respectively (Ginzburg & Sag 2000: 137), where the AVM that follows ↦→ is a description of the value of the CONTENT attribute of the expression that precedes ↦→. Note that polar questions are modeled as zero-parameter propositional abstracts.

(21) a. Who left? ↦→ *question* PARAMS *param* INDEX 1 RESTR person(1) PROP *proposition* SIT *s* SOA qUANTS hi NUCL *leave-rel* LEAVER 1 

22 Semantics

The meaning assigned to questions illustrated above relies on an ontology of messages (the semantic content a clause expresses) which is richer than the traditional notion of propositional content (as distinct from illocutionary force) in work such as Searle (1969). Questions in this view are not just a speech act (where the propositional content of that act remains a proposition), but rather a particular kind of propositionally constructed message, namely a propositioncum-parameters, as shown in Figure 3. Crucially, questions are defined as a parameterized proposition.

Figure 3: A hierarchy of sorts of messages

Of concern to us here is less the specifics of this ontology of messages (or of the introduction in the universe of discourse of place holders and other abstract objects, as is typical of Situation Semantics) than its role in the interface between syntax and semantics, e.g. the fact that clause types can refer to different kinds of messages. Declarative and interrogative clauses are defined as in (22), where the expression that precedes the colon indicates the sort of the phrase and what

#### Jean-Pierre Koenig & Frank Richter

follows the colon is an informal representation of properties of the phrase's constituents (AVMs to the left of the arrow are properties of the mother node and what follows the arrow are properties of the daughters), / indicates default identity between information on the mother and daughter nodes in (22a), and "…" in (22b) informally indicates the absence of constraints on daughters on the general *inter-cl* sort.

$$\begin{aligned} \text{(22)} \quad \text{a. } \textit{decl-cl:} \left[ \begin{matrix} \textit{conNT} \\ \textit{SOA} \end{matrix} \begin{matrix} \textit{austein} \\ \textit{SOA} \end{matrix} \right] & \xrightarrow{} \textit{H} \left[ \begin{matrix} \textit{conNT} \end{matrix} \begin{matrix} \textit{[} \textit{i} \end{matrix} \right] \\ \textit{b. } \textit{inter-cl:} \left[ \begin{matrix} \textit{conNT} \end{matrix} \begin{matrix} \textit{conNT} \end{matrix} \right] & \xrightarrow{} \dots \end{aligned} $$

In contrast to earlier approaches to semantics in HPSG where combining a VP with a subject amounted to nothing more than adding the relevant information in the event structure (akin to functional application), this more constructional approach adds to the "traditional" subject-predicate construction a type-shift unary rule that maps a state of affairs description onto a proposition. In other words, the analysis of clause types familiar from traditional grammar plays an explicit role in the grammar, as they are associated with a particular kind of semantic content. Interrogative clauses (clauses of sort *inter-cl*) are partially defined by their message, i.e. as denoting questions. Different kinds of interrogatives (polar interrogatives, *wh*-interrogatives, and *in situ* interrogatives) can then be defined as subsorts of *inter-cl*. Because this constructional analysis of clause types is embedded in a multiple inheritance network of constructions, an elegant model of similarities in syntax that do and do not correspond to similarities in meaning becomes possible. For example, English declaratives, like typical *wh*-interrogatives, can be inverted (and therefore some declaratives are subject-auxiliary-inversion phrases, as in *Under no circumstance will I allow Tobi to go out at night*, see Fillmore 1999 for a study of the family of inversion constructions in English) and, conversely, some interrogatives are not (*in situ* interrogatives, in particular), but some must be inverted (polar interrogatives). Embedding a constructional semantics (i.e. the association of meaning to particular kinds of clauses) in a multidimensional analysis of phrases allows a model that associates meaning to some structures. It is similar to some versions of Construction Grammar (see Müller 2021c, Chapter 32 of this volume), but it does not require phrasal constructions to be associated with an unpredictable meaning (i.e. with more than the equivalent of functional application in Categorial Grammar-like approaches).

One particularly interesting aspect of the constructional semantics of Ginzburg & Sag (2000) is that it can model differences in scoping possibilities of the parameters associated with *wh*-phrases that occur as fillers of head-filler structures and those associated with *wh*-phrases that occur *in situ*. Consider the sen-

### 22 Semantics

tences in (23) and the difference in interpretation that they can receive. (The observation is due to Baker 1970; see Ginzburg & Sag 2000: 242–246 for discussion.) Sentence (23a) only has interpretation (24a) and, similarly, sentence (23b) has interpretation (24b).

	- b. Who wondered *what* was seen by who?
	- b. For which person and person did wonder which saw what.

The generalization seems to be that the scope of the parameters introduced by *wh*-phrases that occur in filler position (i.e. as (part of) the filler daughter of a head-filler phrase) is constrained by its surface position, but *wh*-phrases that occur *in situ* are not so constrained. Thus, *who* in (23a) (*what* in (23b)) cannot outscope the embedded clause, but *what* in (23a) (*who* in (23b)) can. The explanation for this puzzling observation runs as follows. *Wh*-interrogatives are a subsort of interrogative clauses and head-filler phrases. They are thus subject to the Filler-Inclusion Constraint in (25) that requires the WH value of the filler to be a retrieved parameter (i.e. become part of the PARAMS set; ] in the statement of the constraint stands for disjoint union, i.e. the intersection of the two sets is the empty set). This constraint ensures that *wh*-phrases that are fillers of a head-filler phrase contribute their parameter in the clause they are fillers of. In contrast, the parameter of *wh*-phrases that remain *in situ* are not so constrained and are thus free either to be retrieved in the clause in which they occur or to be retrieved in a higher clause.

(25) Filler Inclusion Constraint: *wh-inter-cl*: - CONT - PARAMS 1 ] *set* → - WH 1 , H

It should be noted that the combination of a constructional and a surface-oriented approach to the semantics of interrogatives requires positing several unary branching constructions whose sole function is to "type-shift" the meaning of the daughter phrase to match the semantic requirements of the phrase it occurs in. Consider the discourses in (26) and (27) (from Ginzburg & Sag 2000: 270, (37) and 280 (63a)), a reprise and non-reprise use, respectively, of *in situ wh*-phrases.

	- B: Jo saw absolutely every shaman priest from WHERE.
	- B: OK, so you'll be leaving WHEN exactly?

#### Jean-Pierre Koenig & Frank Richter

We focus on the latter case, which involves an "ordinary" question interpretation, for simplicity. Since B's answer is syntactically a declarative subjectpredicate clause, its meaning will be of sort *proposition* (as will that of any headsubject clause that is not a *wh*-subject clause). But the meaning associated with this construction is that of a question. So, we need a unary-branching construction that maps the propositional meaning onto the question meaning, i.e. that retrieves the stored parameter contributed by the *wh*-phrase and makes the CON-TENT of the head-subject phrase the value of the PROP attribute of a question. This is what is accomplished by the *is-inter-cl* construction defined in (28). One of the subsorts of this construction is the one involved in discourse (27) and defined in (29) (the Independent Clause feature value "+" in (28) is meant to prevent *in situ* interrogatives from being embedded interrogatives). Assigning distinct messages to different clause types while maintaining a surface-oriented approach requires quite a few such unary branching constructions whose function is strictly semantic.<sup>4</sup>

$$\begin{array}{c} \text{(28)} \quad \begin{array}{l} \text{is-inter-cl:}\\ \begin{bmatrix} \text{IC} & +\\ \text{CAT} & \begin{bmatrix} \text{SUBJ} & \langle \rangle\\ \text{vFORM} & \hat{f} \text{in} \end{bmatrix} \end{array} \end{array} \begin{cases} \text{+} \\ \begin{bmatrix} \text{vFORM} \ f \hat{n} \end{bmatrix} \end{cases} \rightarrow \begin{array}{l} \text{H} \ \text{[} \ \text{]} \end{array}$$


# **6 Semantic underspecification**

One of the hallmarks of constraint-based grammatical theories is the view that grammars involve *descriptions* of structures and that these descriptions can be non-exhaustive or incomplete, as almost all descriptions are. This is a point that was made clear a long time ago by Martin Kay in his work on unification (see among others Kay 1979). For a long time, the distinction between (partial) descriptions (possible properties of linguistic structures, what grammars are about) and (complete) described linguistic structures was used almost exclusively in the syntactic component of grammars within HPSG. But starting in the mid-90s, the importance of distinguishing between descriptions and described structures began to be appreciated in HPSG's model of semantics, as discussed for example

<sup>4</sup>Müller (2015) argues that a surface-oriented grammar does not have to rely on a hierarchy of clause types to model the interaction of clause types and semantics: appropriate lexical specifications on verbs (as the heads of sentences) and phrasal principles that exploit the local internal structure of a sentence's immediate daughters can be used to achieve the same effects. See also Müller (2021b: Section 5.3), Chapter 10 of this volume.

#### 22 Semantics

in Nerbonne (1993), in Frank & Reyle's HPSG implementation of Underspecified Discourse Representation Theory (Frank & Reyle 1992; 1995), and Copestake, Flickinger, Malouf, Riehemann & Sag (1995), and recent work has also stressed the importance of the same distinction when modeling inflectional morphology (see Crysmann & Bonami 2016 and Crysmann 2021: Section 3, Chapter 21 of this volume). Because underspecification, partiality, and the like are so critical to HPSG, their inclusion in the model of the semantics of grammar has made recent work in semantics in HPSG quite distinctive from work in semantics within even conceptually related frameworks such as Lexical Functional Grammar (see Bresnan & Kaplan 1982, among others, and Wechsler & Asudeh 2021, Chapter 30 of this volume) or variants of Categorial Grammar (see Steedman 1996, among others, and Kubota 2021, Chapter 29 of this volume). Two competing approaches to semantic underspecification have been developed within HPSG: Minimal Recursion Semantics (henceforth, MRS; see Copestake et al. 1995, Copestake et al. 2001, and Copestake et al. 2005 for introductions to MRS) and Lexical Resource Semantics (henceforth, LRS; see Richter & Sailer 2004a,b and Iordăchioaia & Richter 2015 for an introduction to LRS). MRS and LRS are not the only two "recent" approaches to assembling the meaning of phrases from lexical "meanings" (or resources). Asudeh & Crouch (2002), for example, show how to apply a glue approach to semantic interpretation to HPSG. Aside from simplification of the Semantics Principle (which, under a Glue Semantics approach, does not distinguish how to compose meaning on the basis of the semantic type of the daughters, e.g. whether one of the daughters is a quantifier), a glue approach leads to "highly efficient techniques for semantic derivation already implemented for LFG, and which target problems of ambiguity management also addressed by Minimal Recursion Semantics" (p. 1). For reasons of space, we cannot detail Asudeh and Crouch's glue approach here; we concentrate on MRS and LRS, as they have been the dominant approaches to semantic composition in HPSG in recent years. But the existence of yet another approach to semantic interpretation attests of the flexibility of the HPSG architecture when trying to model the interface between syntax and semantics.

# **6.1 Minimal Recursion Semantics**

### **6.1.1 Why minimally recursive semantic representations**

MRS developed out of computational semantic engineering considerations related to machine translation for face-to-face dialogue that started in the early 90s (see Kay et al. 1994 for an overview of the Verb*mobil* project). As Copestake et al. (1995) argue, syntactic differences between languages can lead to logically

#### Jean-Pierre Koenig & Frank Richter

equivalent distinct semantic representations when using traditional "recursive" semantic representations. They point out, for example, that the English expression *fierce black cat* and Spanish *gato negro y feroz* would be given distinct semantic representations under standard assumptions, as shown in (30).

	- b. .(cat() ∧ (black() ∧ fierce()))

These distinct semantic representations would make translating these simple nominal expressions from one language to the other difficult. Furthermore, some sentences may be similarly ambiguous in English and Spanish (for example, sentences that contain generalized quantifiers), and requiring the semantic disambiguation of these sentences prior to translating them into sentences that contain similar ambiguities is inefficient. Semantic representations should only be as disambiguated as the source language grammar entails. For these reasons and others they detail, Copestake et al. (1995) propose to model the semantics of grammar via semantic representations that are as flat (or non-recursive) as possible. To achieve this minimal recursivity despite the fact that disambiguated scope relations among generalized quantifiers require embedding, they add additional variables or handles that serve as labels to particular relations in the flat list of relations and that can serve as "arguments" of scopal operators. (31) and its underspecified and fully disambiguated semantic representations in (32) illustrate this informally and (33) more formally. Subscripts on names of relations in the informal representation stand for labels of the formulas they are part of. Thus, 1 in *every*1*(, 3, )* is a label for the entire formula. In the more explicit representation in (33), the label of a formula is written before it and separated from it by a colon (e.g. h1:every(x,h3,h2) ); variables over labels are simply labels that do not correspond (yet) to labels of formulas (*ℎ*2 and *ℎ*6).<sup>5</sup>

	- b. every<sup>1</sup> (, 3, 4), dog3(), cat7(), some5(, 7, 1), chase4(, , )
	- c. every<sup>1</sup> (, 3, 5), dog3(), cat7(), some5(, 7, 4), chase4(, , )

<sup>5</sup>Copestake (2007) presents a Neo-Davidsonian version of MRS called R(obust)MRS where arguments of predicates (aside from their event variable) are contributed via independent elementary predications. Copestake shows that RMRS can be profitably used with shallower analyses, "including part-of-speech tagging, noun phrase chunking and stochastic parsers which operate without detailed lexicon" (p. 73); see Peldszus & Schlangen (2012) for how RMRS allows for the incremental construction of meaning representations in dialogue systems.

### 22 Semantics

	- c. *ℎ*2 = *ℎ*5, *ℎ*6 = *ℎ*4

To understand the use of handles, consider the expression *every*<sup>1</sup> (*x, 3, n*). The first argument of the generalized quantifier is the handle numbered 3, which is a label for the formula *dog*3(). The formula that serves as the first argument of *every* is fixed: it is always the meaning of the nominal phrase that the determiner selects for. But to avoid embedding that relation as the restriction of the quantifier and to preserve the desired flatness of semantic representations, the second argument of *every* is not *dog*3(), but the label of that formula (indicated by the subscript 3 on the predication *dog*3()). Now, in contrast to the quantifier's restriction, which must include the content of the head noun it combines with, the nuclear scope or body of the quantifier is not as restricted. In other words, the semantic representation determined by an MRS grammar of English does not fix the second argument of *every*, represented here as the variable over handles . The same distinction applies to *some*: its first argument is fixed to the formula *cat*7(), but its second argument is left underspecified, as indicated by the variable over numbered labels . Resolving the scope ambiguity of the underspecified representation in (32a) amounts to deciding whether *every* takes the formula that contains *some* in its scope or the reverse; in the first case, = 5, in the second, = 1. Since the formula that encodes the meaning of the verb (namely, *chase*4(*, ,* )) is outscoped by the nuclear scope or body of both generalized quantifiers, either constraint will fully determine the relative scope of all formulas in (32). Although it is possible to use a typed feature structure formalism to resolve scope ambiguities, Copestake et al. (1995: 309–311) argue that it is more efficient for the relevant resolution process not to be part of the grammar and be left to a separate algorithm.

### **6.1.2 The nitty-gritty**

We now present a brief outline of how MRS works in typed feature structures. First, the content of an expression is of sort *mrs*. Structures of that sort consist of (1) a bag of relations or elementary predications (the value of RELS), (2) a HOOK, which groups together the labels or handles that correspond to elementary predications that have widest local and global scope and the expression's index (these three semantic objects are what is visible to semantic functors), and (3) a set of constraints on handles that restrict or determine the scope of scope-relevant elementary predications (the value of HCONS – for handle constraints). Each

Jean-Pierre Koenig & Frank Richter

constraint in the value of HCONS consists of a greater or equal relation between handles. A representation of the structure of an object of sort *mrs* is provided in (34).

$$\begin{array}{c|c} \mid mrs \\ \mid \text{HOOK} & \begin{bmatrix} \text{NDEex} \; \text{index} \\ \text{LTOK} & \begin{bmatrix} \text{LTOP} & \text{handle} \\ \text{GTOP} & \text{handle} \end{bmatrix} \\ \mid \text{RELS} & \text{list(relation)} \\ \mid \text{HONO} & \text{list(eq)} \end{bmatrix} \end{array}$$

Sentence (35) and its (underspecified) *mrs* representation in (36) illustrate how *mrs* structures can be used to capture scope underspecification (see Copestake et al. 2005: 306).

$$\text{(35)} \quad \text{Every dog probability slopes.}$$

$$\begin{aligned} \text{(36)} \quad & \begin{bmatrix} \text{mrs} \\ \text{HOOK} & \begin{bmatrix} \text{hock} \\ \text{GTOP} & \begin{bmatrix} \text{L} \\ \text{L} \end{bmatrix} \end{bmatrix} \\ \text{(36)} \quad & \begin{bmatrix} \text{every}\_{r}rel \\ \begin{bmatrix} \text{LBL} & \begin{bmatrix} \text{2} \\ \text{ARGR} \end{bmatrix} \\ \text{(ARCR & \begin{bmatrix} \text{g} \\ \text{RGR} \end{bmatrix} \end{bmatrix} \\ & \begin{bmatrix} \text{RGR} & \begin{bmatrix} \text{g} \\ \text{RGR} & \begin{bmatrix} \text{g} \\ \text{RGR} \end{bmatrix} \end{bmatrix} \end{bmatrix}, \begin{bmatrix} \text{pply-}rel \\ \text{LBL} & \begin{bmatrix} \text{g} \\ \text{LBL} \end{bmatrix} \end{bmatrix}, \begin{bmatrix} \text{sleep-}rel \\ \text{LBL} & \begin{bmatrix} \text{g} \\ \text{LBL} \end{bmatrix} \end{bmatrix} \end{)} \end{aligned}$$

$$\begin{aligned} \text{(RCON)} \quad & \begin{bmatrix} \text{e}eq \\ \text{HARG} \\ \text{LARG} \end{bmatrix}, \begin{bmatrix} \text{e}eq \\ \text{HARG} \\ \text{LARG} \end{bmatrix}, \begin{bmatrix} \text{e}eq \\ \text{HARG} \\ \text{LARG} \end{bmatrix} \end{bmatrix}, \begin{bmatrix} \text{e}eq \\ \text{HARG} \\ \text{LARG} \end{bmatrix} \end{aligned}$$

Members of RELS correspond to the content of lexical entries while members of HCONS constrain the relative scope of semantic arguments of members of RELS. Now, although the grammar of English leaves the meaning of (35) underspecified, it *does* constrain some scope relations, and the *mrs* in (36) therefore constrains how some elementary predications relate to each other. First, the identity between the value of ARG0 for both the *every\_rel* and *dog\_rel* elementary predications indicates that *every* in (35) quantifies over dogs; 3 is the variable bound by the quantifier. And similarly, the value of ARG1 of *sleep\_rel* is lexically constrained to correspond to the index of the subject, itself constrained to be identical to the value of ARG0 for the *dog\_rel* predication (i.e. 3 ). Second, *prbly\_ rel* is required to outscope *sleep\_rel* (a *qeq* constraint either identifies its HARG or LARG or it constrains its HARG to outscope its LARG). Similarly, the restriction of

### 22 Semantics

*every\_rel* is constrained to outscope *dog\_rel* as <sup>4</sup> = <sup>6</sup> . Finally, the global top (the value of GTOP) is constrained to outscope the local top (the value of LTOP). (To simplify, the local top is the handle of the elementary predication that is not a quantifier with the widest scope.) The semantic representation that the grammar of English motivates remains underspecified, as it does not specify what the value of the BODY of *every\_rel* is, in particular whether it is the handle of the *prbly\_rel* or *sleep\_rel* elementary predications. Resolving this scope ambiguity amounts to adding an HCONS that identifies the value of BODY with either handle, i.e. 5 or 8 .

Examples that include multiple quantifiers work in a similar way. Take the sentence in (37) and the elementary predications for *every*, *chases*, and *some* (we only include relevant elementary predications and attributes for simplicity). We know that the body of *every\_rel* and *some\_rel* each outscope *chase\_rel* (so <sup>1</sup> = <sup>3</sup> and <sup>2</sup> = <sup>3</sup> , where the left-hand side of the equality corresponds to the HARG and the right-hand side to the LARG).<sup>6</sup> But we do not know if *every\_rel* outscopes *some\_rel* or the reverse; adding either HCONS <sup>1</sup> = <sup>2</sup> or <sup>2</sup> = <sup>1</sup> specifies which is the case. (This example illustrates that = is not commutative, as it is meant to encode greater or equal scope.) Figure 4 provides a tree representation of the underspecified outscope relation induced by = constraints; dashed lines indicate that there may be intervening semantic material between the operator and what it outscopes.

(37) Every dog chases some cat.


Semantic composition within MRS is relatively simple and is stated in (39) (Copestake et al. 2005: 313–314); the third clause of this semantic composition rule amounts to a case-based definition, as is true of all Semantics Principles since Pollard & Sag (1987), as different constructions determine differently the HOOK of the head daughter (Copestake et al. 2005 only discuss intersective and scopal constructions in their paper).<sup>7</sup>

<sup>6</sup>Copestake et al. (2005) do not explicitly require the nuclear scope of generalized quantifiers to outscope the predicate denoted by the verb they are syntactic dependents of, as it follows from some general assumptions about the structure of fully resolved MRS. Since Lexical Resource Semantics does so, we include additional <sup>1</sup> = <sup>3</sup> and <sup>2</sup> = <sup>3</sup> constraints in the text and their effect in Figure 4 for ease of comparison. Nothing critical hinges on this issue.

<sup>7</sup>A slot in (39–4) is defined as "a semantic argument position in a word or phrase A that is associated with syntactic constraints on the word or phrase B whose semantics will supply that argument when the relevant grammar rule combines A and B" (Copestake et al. 2005: 313).

Jean-Pierre Koenig & Frank Richter

Figure 4: A graphical representation of the underspecified scope relations induced by = constraints for sentence (37)

	- 2. The HCONS of a phrase is the concatenation (append) of the HCONS values of the daughters.
	- 3. The HOOK of a phrase is the HOOK of the semantic head daughter, which is determined uniquely for each construction type.
	- 4. One slot of the semantic head daughter of a phrase is identified with the HOOK in the other daughter.

This quite brief description of MRS illustrates what is attractive about it from an engineering point of view. Semantic composition is particularly simple: concatenation of lists (lists of elementary predications and constraints), percolation of the HOOK from the semantic head, and some general constraint on connectedness between the head daughter and the non-head daughter. Furthermore, resolving scope means adding = constraints to a list of =, thus avoiding traversing the semantic tree to check on scope relations. Furthermore, a flat representation makes translation easier, as argued in Copestake et al. (1995), and has several other advantages from an engineering perspective as detailed in Copestake

### 22 Semantics

(2009). The ease flat representations provide comes at a cost, though, namely that semantic representations are cluttered with uninterpretable symbols (handles) and, more generally, do not correspond to sub-pieces of a well-formed formula. For example, we would expect the value of a quantifier restriction and nuclear scope to be, say, formulas denoting sets (as per Barwise & Cooper 1981), not pointers to or labels of predications. This is not to say that a compositional, "standard" interpretation of MRS structures is not possible (see, for example, Copestake et al. 2001); it is rather that the model-theoretic interpretation of MRS requires adding to the model hooks and holes, abstract objects of dubious semantic import. While it is true, as Copestake et al. point out, that abstract objects have been included in the models of other semantic approaches, Discourse Representation Theory (DRT) in particular (Zeevat 1989), abstract objects in compositional specifications of DRT and other such dynamic semantic approaches are composed of semantically interpretable objects. In the case of DRT, the set of variables (discourse referents) that form the other component of semantic representations (aside from predicative conditions) are anchored to individuals in the "traditional" model-theoretic sense. Holes and hooks, on the other hand, are not necessarily so anchored, as labels (handles) do not have any interpretation in the universe of discourse.

An example of the model-theoretic opacity of handles is provided by the compositional semantics of intersective attributive adjectives. The RELS value of *white horse*, for example, is as shown in (40) (after identification of the handles of the labels due to the meaning composition performed by the *intersective\_phrase* rule that (intersective) adjectival modification is a subsort of).


The fact that the value of ARG0 is the same for both elementary predications ( 2 ) is model-theoretically motivated: both properties are predicated of the same individual. The fact that the value of LBL is identical ( 1 ) is also motivated if labels are used to help determine the scope of quantifiers; in a quantifier like *every white horse*, the content of *white* and *horse* conjunctively serve as the restriction of *every\_rel* represented in (41).

$$(41) \quad \begin{bmatrix} \textit{every\\_rel} \\ \textit{REST\\_handle} \\ \textit{BODY\\_handle} \end{bmatrix}.$$

#### Jean-Pierre Koenig & Frank Richter

But the identity of the two elementary predications' labels is not *directly* modeltheoretically motivated. It is a consequence of the semantic representation language that is used to model the meaning of sentences, not a consequence of the sentences' truth conditions.

# **6.2 Lexical Resource Semantics**

Whereas MRS emphasizes underspecification in semantic representations and expresses the syntax of underspecified representations in HPSG as typed feature structures, LRS focuses primarily on fine-grained linguistic analyses with explicit higher-order logics for meaning representation and utilizes underspecification prominently in the architecture of the syntax-semantics interface. Instead of encoding underspecified representations as denotations of grammar principles, it uses the feature logic itself as a tool for underspecifying fully specific logical representations in the symbolic languages of the literature on formal semantics. This means that a grammar with LRS semantics denotes sets of syntactic structures that comprise unambiguous meaning representations in a standard logical language, but it does so by means of underspecification in the grammar principles. By formulating very general ("underspecified") grammar principles which define the relationship between syntactic structure and semantic representation, LRS follows the lead of HPSG syntax. Grammar principles may admit a large number of structures, which in this case can be multiple semantic representations compatible with one and the same syntactic structure. An LRS analysis may then represent the readings of a sentence with two generalized quantifiers like (31), *every dog chased some cat* (repeated below as (42)) – i.e. the two readings shown in (43) – as distinct possible values of a semantics feature.


The syntactic format of semantic representations is flexible and can be adapted to the purposes of the linguistic analysis at hand. While (43) chooses predicates with an argument for possible worlds, lambda abstraction over the unary predicates which translate the nominal arguments, and categorematic quantifiers of type hhi hhiii, in many contexts less elaborate representations will suffice, and the two readings would be rendered in a notational variant of first order languages. Other phenomena might necessitate more semantic structure. The LRS

22 Semantics

framework makes a selection of choices available to linguists to decide what is most adequate to spell out a semantic analysis.

### **6.2.1 Basic architecture**

Lexical items contribute semantic resources to utterances; every semantic representation of an utterance must use up all and only the semantic resources provided by the lexical items in the utterance in all their legitimate combinations.<sup>8</sup> What is legitimate is determined by semantic principles which restrict at each phrase how the semantic resources of its daughters may be combined. Anything these restrictions do not rule out is permitted. Scope ambiguities between co-arguments of a verb can be seen as arising from the lack of a principled restriction to the effect that one outscopes the other. In the absence of restrictions, LRS expects ambiguity. As a special property setting LRS apart from other semantic underspecification frameworks, LRS semantics exploits HPSG's notion of structure sharing in its semantic representations by permitting that semantic contributions of different lexemes may in fact be identical. For example, if two words in a clause contribute negation in their meaning, the two negations may in fact turn out to be the same negation, in which case we observe a negative concord reading. The implementation of this idea is based on the fundamental structure-sharing mechanism of HPSG, which is available throughout all levels of grammatical description.

The combinatorial semantics of phrases is encoded with structures of sort *lrs*:

(44) SEM *lrs* EXCONT *me* INCONT *me* PARTS *list*(*me*) 

Signs have an attribute SEMantics with value *lrs*. External content (EXCONT) and internal content (INCONT) designate two prominent aspects of the semantics of signs. Both of these attributes have values of sort *meaningful\_expression*, for short *me*. The attribute EXCONT contains a term that represents the meaning of the maximal syntactic projection of the sign and is built from semantic material contributed within the projection. The INCONT is that part of a lexical sign's representation which is outscoped by any scope-taking operator that it combines with within its syntactic projection. The PARTS list records all semantic resources contributed by a given sign. The LRS Projection Principle in (45a) governs the

<sup>8</sup>Lexical items may be phrasal.

#### Jean-Pierre Koenig & Frank Richter

percolation of these attribute values along the syntactic head path of phrases, whereas the EXCONT and INCONT Principles in (45b–c) determine the relationship of the respective attribute values to other semantic attribute values within local syntactic trees. The most important relationships are those of term identity and of subtermhood of one term relative to another or to some designated part of another term. Subterm restrictions are in essence similar to the *qeq* constraints of MRS.

# (45) a. LRS Projection Principle

In each phrase,


In each *lrs*, the INCONT value is an element of the PARTS list and a component of the EXCONT value.

c. EXCONT Principle

First, in every phrase, the EXCONT value of the non-head daughter is an element of the non-head daughter's PARTS list. Second, in every utterance, every subexpression of the EXCONT value of the utterance is an element of its PARTS list, and every element of the utterance's PARTS list is a subexpression of the EXCONT value.

The Projection Principle guarantees the percolation of EXCONT and INCONT values along the head path of syntactic phrases, and it records the semantic resources available at each phrase based on the semantic contributions of their daughters (45a). The INCONT Principle and the EXCONT Principle manage the properties of the respective attribute values. The term with minimal scope of each lexeme must be contributed by the lexeme itself and must be semantically realized within the representation of the maximal syntactic head projection (45b). The maximal semantic meaning contribution of a maximal syntactic projection must originate from within that maximal projection, and an utterance (as a distinguished maximal projection) consists of all and only those pieces of semantic representation which are contributed by some lexeme in the utterance (45c). The meaning of an utterance is given by the semantic representation which is its EXCONT value. An ambiguous utterance receives structural analyses that are potentially only distinguished by different EXCONT values of their root node.

### 22 Semantics

The constraints in (45) take care of the integrity of the semantic combinatorics. The task of the clauses of the Semantics Principle is to regulate the semantic restrictions on specific syntactic constructions (as in all previously discussed versions of semantics in HPSG). A quantificational determiner, represented as a generalized quantifier, which syntactically combines as non-head daughter with a nominal projection, integrates the INCONT of the nominal projection as a subterm into its restrictor and requires that its own INCONT (containing the quantificational expression) be identical with the EXCONT of the nominal projection. This clause makes the quantifier take wide scope in the noun phrase and forces the semantics of the nominal head into the restrictor. In (43) we observe the effect of this clause by the placement of the predicate dog in the restrictor of the universal and the predicate cat in the restrictor of the existential quantifier.

Another clause of the Semantics Principle governs the combination of quantificational NP arguments with verbal projections. If the non-head of a verbal projection is a quantificational NP, the INCONT of the verbal head must be a subexpression of the scope of the quantifier. Since this clause does not require immediate scope, other quantificational NPs which combine in the same verbal projection may take scope in between, as we can again see with the two possible scopings of the two quantifiers in (43), in particular in (43b), where the subject quantifier intervenes between the verb and the object quantifier.

The local semantics of signs is split from the combinatorial *lrs* structures in parallel to the separation of local syntactic structure from the syntactic tree structure. The local semantics remains under the traditional CONTENT attribute, where it is available for lexical selection by the valence attributes. The LOCAL value of the noun *dog* illustrates the relevant structure:


The attribute DISCOURSE-REFERENT (DR) contains the variable that will be the argument of the unary predicate dog, which is the MAIN semantic contribution of the lexeme. The variable, x, does not come from the noun but is available to the noun by selection of the determiner by the valence attribute SPR. The subscripted tag 1 on the SPR list indicates the identity of DR values of the determiner and the nominal head *dog*. A principle of local semantics says that MAIN values and DR values are inherited along the syntactic head path.

Figure 5: Combining the meaning of *every* and *dog*

The semantics of phrases follows from the interaction of the (lexical) selection of local semantic structures and the semantic combinatorics that results from the principles in (45) and the clauses of the Semantics Principle. For ease of readability, Figure 5 omits the lambda abstractions from the generalized quantifier, chooses a notation from first order logic, and does not make all structure sharings between pieces of the logical representation explicit. The head noun *dog* contributes (on PARTS, PTS), the predicate dog and the application of the predicate to a lexically unknown argument, 2 , identical with the DR value of *dog*. As shown in (46), the DR value of the noun is shared with the DR value of the selected determiner, which is the item contributing the variable x to the representation. In addition, *every* contributes the quantifier and the application of the quantifier to its arguments. The clause of the Semantics Principle which restricts the combination of quantificational determiners with nominal projections identifies the INC of *every* with the EXC of *dog*, and requires that the INC of dog ( 3 ) be a subterm of the restrictor of the quantifier, (notated as ' <sup>3</sup> *⊳* ', conjoined to the AVM describing the phrase). The identification of the EXC and INC of *every* follows from (45b–c). According to this analysis, the semantic representation of the phrase *every dog* is a universal quantification with dog(x) in the restrictor and

### 22 Semantics

unknown scope (). The scope will be determined when the noun phrase combines with a verb phrase. For example, such a verb phrase could be *barks*, as in Figure 6. If its semantics is represented as a unary predicate bark, the predicate and its application to a single argument are contributed by the verb phrase, and local syntactic selection of the subject *every dog* by the verb *barks* identifies this argument as variable x, parallel to the selection of the quantifier's variable by *dog* above. The relevant clause of the Semantics Principle requires that bark(x) be a subterm of , and the EXC, <sup>2</sup> , of the complete sentence receives the value ∀x (dog(x), bark(x)) as the only available reading in accordance with the EXCONT Principle.

Figure 6: Combining the meaning of *every dog* and *barks*

In Figure 6, the identity of the restrictor of the universal quantifier with <sup>3</sup> dog(x) and of its scope with <sup>1</sup> bark(x) are determined at the utterance level by the lack of other material that could be added to the two arguments of the quantifier. For example, an extraposed relative clause which belongs to *every dog* could consistently contribute its meaning representation to the restrictor, and only the absence of such additional semantic material leads to the inferred identity of <sup>3</sup> with .

Underspecification of the structure of meaning representations in the clauses of the Semantics Principle and in lexical entries interacts with the possibility of structure sharing. If two pieces of meaning representation have the same shape

#### Jean-Pierre Koenig & Frank Richter

and obey compatible structural conditions (as determined by relevant subterm constraints), they can be identical. Even stronger, in certain grammatical constellations, principles of grammar may even require their identity. Lexical underspecification of meaning contributions moreover permits the shared construction of functors such as the construction of a polyadic quantifier from several lexical items in a sentence. These two applications of LRS lead to new possibilities of semantic composition compared to standard compositional semantics in Mainstream Generative Grammar, because functors can be composed in (logical) syntax which cannot be semantically decomposed or cannot be decomposed within the structural limits of a surface-oriented syntax, i.e. a syntactic structure which only reflects syntactic but not semantic composition.

Consider the semantic representation of the Polish sentence *nikt nie przyszedł* 'nobody came' in Figure 7.

Figure 7: Combining the meaning of Polish *nikt* and *nie przyszedł*

Negated finite verbs in Polish contribute a negation that must be realized within the verb's EXCONT ( 4b *⊳* 1 ) and outscopes the INCONT of the verb ( 4 *⊳* ). Similarly, the existential quantifier of the n-word *nikt* 'nobody' is outscoped by negation ( <sup>2</sup> *⊳* ). However, in addition to the familiar restriction when the quantificational subject combines with the finite verb, Polish as a strict negative concord language requires that a negated finite verb be in the scope of at most one negation in its EXCONT, entailing identity of the two negations, 2c = 4b , and

### 22 Semantics

the single negation reading *nobody came* as the only admissible reading of the sentence shown in Figure 7. To capture obligatory negation marking on finite words in Polish, a second principle of negative concord rules that if a finite verb is in the scope of negation in its EXCONT, it must itself be a contributor of negation (Richter & Sailer 2004b: 316). The resulting EXCONT value in Figure 7 is ¬∃ x (person(x), come(x)).

The idea of identifying contributions from different constituents in an utterance is even more pronounced in cases of unreducible polyadic quantification. The reading of (47a) in which each unicorn from a collection of unicorns has a set of favorite meadows that is not the same as the set of favorite meadows of any other unicorn is known to be expressible by a polyadic quantifier taking two sets and a binary relation as arguments (47d), but it cannot be expressed by two independent monadic quantifiers (Keenan 1992).

	- b. different meadows: ( 0 *,* Δ) (<sup>1</sup> *, .*meadow()*,* 1*.*<sup>0</sup> )
	- c. every unicorn: (∀*,*) (*.*unicorn()*,* 2*,* 2*.*)
	- d. (∀*,* Δ) (*.*unicorn()*, .*meadow()*, .*prefer(*,* ))

(47) sketches the LRS solution to this puzzle in Richter (2016). The adjective *different* contributes an incomplete polyadic quantifier of appropriate type which integrates the representation of the nominal head of its NP into the second restrictor but leaves open a slot in the representation of its functor for another quantifier it must still combine with (47b). The determiner *every* underspecifies the realization of its quantifier in such a way that one of the possible representations yields (47c) for *every unicorn*, which is exactly of the right shape to be identified with the representation of *different meadows*, leading to the expression in (47d) for (47a). Lahm (2016) presents an alternative account of such readings with *different* using Skolem functions which also hinges on LRS-specific techniques. Iordăchioaia & Richter (2015) study Romanian negative concord constructions and represent their readings using polyadic negative quantifiers; Lahm (2018) develops a lexicalized theory of plural semantics.

### **6.2.2 Representation languages and notational conventions**

Any LRS grammar relies on an encoding of the syntax of an appropriate semantic representation language in the feature logic. In principle, any finitary logical language can be encoded in Relational Speciate Reentrant Language, which covers every language that has been proposed for meaning representations in linguistics. Work in LRS has so far been couched mostly in variants of Two-sorted Type

#### Jean-Pierre Koenig & Frank Richter

Theory (Ty2, Gallin 1975) as one of the standard languages of formal semantics, or in Montague's Intensional Logic. The type system of these logical languages is useful for underspecified descriptions in semantic principles, since relevant groups of expressions can be generalized over by their type without reference to their internal structure. For example, a clause of the Semantics Principle can use the type of generalized quantifiers to distinguish quantificational complement daughters of verbal projections and state the necessary restrictions on how they are integrated with the semantics of the verbal head daughter, while other types of complement daughters are treated differently and may not even be restricted at all by a clause in the Semantics Principle in how they integrate with the verbal semantics. The latter is often the case with proper names and definite descriptions, which can be directly integrated with the semantics of the verb by lexical argument selection.

Encodings of semantic representations in feature logic are usually assumed as given by the background LRS theory. Examples of encodings can be found in Sailer (2000) and Richter (2004). Sailer (2000) offers a correspondence proof of the encoded structures with a standard syntax of languages of Ty2. As descriptions of logical terms in literal feature logic are very cumbersome to read and write and offer no practical advantage or theoretical insight, all publications use notational shortcuts and employ logical expressions with metavariables for their descriptions instead. As nothing depends on feature logical notation, the gain in readability outweighs any concerns about notational precision.

# **7 Conclusion**

Semantics in HPSG underwent significant changes and variations over the past three decades, and the analyses couched in the different semantic theories were concerned with a wide variety of semantic phenomena. Two common denominators of the approaches are the relative independence of syntactic and semantic structure in the sense that the syntactic tree structure is never meant to mirror directly the shape of the syntax of semantic expressions, and the use of HPSGspecific techniques to characterize semantic expressions and their composition along the syntactic tree structure. Of particular relevance here is the use of a rich sort hierarchy in the specification of semantic structures and the use of underspecification in determining their shape, as these two aspects of the HPSG framework play a prominent and distinguishing role in all semantic theories. The flexibility of these tools makes HPSG suitable for the integration of very diverse theories of meaning of natural languages while respecting representational

22 Semantics

modularity, i.e. the assumption that distinct kinds of information associated with strings (e.g. inflectional information, constituency, semantic information) are not reflected in a single kind of syntactic information, say tree configurations, as it typically is assumed to be in Mainstream Generative Grammar.

# **Acknowledgments**

We thank Jonathan Ginzburg and Stefan Müller for many careful critical remarks which helped us improve this chapter considerably. We thank Elizabeth Pankratz for editorial comments and proofreading.

# **References**


#### Jean-Pierre Koenig & Frank Richter


#### 22 Semantics


#### Jean-Pierre Koenig & Frank Richter

*of the European Chapter of the Association for Computational Linguistics*, 9–16. Dublin: Association for Computational Linguistics.


22 Semantics


#### Jean-Pierre Koenig & Frank Richter

istik – Impulse & Tendenzen 65), 72–105. Berlin: de Gruyter. DOI: 10 . 1515 / 9783110423112-004.


22 Semantics

Richard T. Oehrle (eds.), *Proceedings of the FHCG98. Joint Conference on Formal Grammar, Head-Driven Phrase Structure Grammar and Categorial Grammar*, 185–195. Saarbrücken: Universität des Saarlandes.


Jean-Pierre Koenig & Frank Richter


# **Chapter 23**

# **Information structure**

# Kordula De Kuthy

Universität Tübingen

Information structure as the hinge between sentence and discourse has been at the center of interest for linguists working in different areas such as semantics, syntax or prosody for several decades. A constraint-based grammar formalism such as HPSG that encodes multiple levels of linguistic representation within the architecture of signs opens up the possibility to elegantly integrate such information about discourse properties into the grammatical architecture. In this chapter, I discuss a number of approaches that have explored how to best integrate information structure as a separate level into the representation of signs. I discuss which lexical and phrasal principles have been implemented in these approaches and how they constrain the distribution of the various information structural features. Finally, I discuss how the various approaches are used to formulate theories about the interaction of syntax, prosody and information structure. In particular, we will see several cases where (word order) principles that used to be stipulated in syntax can now be formulated as an interaction of syntax and discourse properties.

# **1 Introduction**

The *information structure* of a sentence captures how the meaning expressed by the sentence is integrated into the discourse. *Information structure* thus encodes which part of an utterance is informative in which way, in relation to a particular context. A wide range of approaches exists with respect to the question of what should be regarded as the primitives of the information structure.

It is now commonly assumed that there are three basic dimensions of information structure that are encoded in natural languages and that have been assumed as its basic primitives: (i) a distinction between what is new information advancing the discourse (*focus*) and what is known, i.e., anchoring the sentence in existing (or presupposed) knowledge or discourse (*background*), (ii) a distinction between what the utterance is about (*topic*, *theme*) and what the speaker has

Kordula De Kuthy. 2021. Information structure. In Stefan Müller, Anne Abeillé, Robert D. Borsley & Jean- Pierre Koenig (eds.), *Head-Driven Phrase Structure Grammar: The handbook*, 1043–1078. Berlin: Language Science Press. DOI: 10.5281/zenodo.5599864

#### Kordula De Kuthy

to say about it (*comment*, *rheme*), and (iii) a dimension referred to as *information status* where entities that have already been mentioned in the discourse (*given*) are distinguished from those that have not been mentioned (*new*).<sup>1</sup> For all three ways of partitioning the information structure, we find approaches within the HPSG framework. Example (1) illustrates how one utterance in the context of a question can be structured according to different partitionings of information structure.

(1) Q: What does Sarah drink?


The focus/background division with focus as the part of an utterance that is informative with respect to the discourse is one of the most commonly adopted partitionings when studying information structure, and thus many approaches within the HPSG architecture assume a division into focus and background, such as the ones that will be discussed in this article: Engdahl & Vallduví (1996), De Kuthy (2002), Webelhuth (2007), Paggio (2009), Bildhauer (2008), Song & Bender (2012) and Song (2017). Less common within the HPSG framework are approaches that take topic, i.e., the material that an utterance is about, as the central notion and assume topic and comment (or theme and rheme) as the primitives of the information structure. Most approaches discussed here assume that the background has one designated (mostly referential) element functioning as the topic (or link), among them Engdahl & Vallduví (1996), De Kuthy (2002), Paggio (2009) and Song (2017).

With respect to information status (including primitives such as new and given mentioned above), the discourse status of referential elements is of interest, i.e., whether they can be linked to previously mentioned items, i.e., whether they are (discourse) old or given, or whether they haven't been mentioned before and are thus (discourse) new. The representation of information status has received comparatively little interest within the HPSG community; the approach by De Kuthy & Meurers (2011) is one of the few that explicitly integrate this dimension into their information structural architecture.

The need to represent discourse properties within a grammar architecture results from the insight that in many, if not all, languages, the way utterances are realized via their syntactic structure, morphological patterns and prosody very often interacts with discourse requirements of these utterances. In other words, approaches dealing with constraints on word order in a particular construction

<sup>1</sup>For a comprehensive overview of the different research strands with respect to the information structural dimension, see Kruijff-Korbayová & Steedman (2003).

#### 23 Information structure

need to encode that this particular word order is only grammatical given a particular context, or a particular accent pattern has to be connected to a particular discourse status of the accented elements.<sup>2</sup> Most of the approaches discussed here deal with such interface questions, and I therefore discuss the particular word order and phonetic theories that have been implemented in Sections 6 and 7 in detail. As a starting point, however, I will first discuss the various architectural designs that have been implemented in order to be able to formulate the specific theories integrating discourse constraints into the grammar architecture.

# **2 Information structure in the architecture of signs**

Several ways of representing information structure within the architecture of signs have been pursued as part of the HPSG framework: one of the earliest approaches, which is similar to the idea of F-marking as pursued in many syntaxbased approaches to information structure in Generative Grammar (such as Jackendoff 1972; Selkirk 1984), has been proposed by Manandhar (1994b). He assumes that all signs have an additional feature INFO-STRUC which takes as its value objects of the sort *info-type*. A sign can then have one of the subtypes of *info-type* shown in Figure 1 as its informational marking.

Figure 1: Type hierarchy under *info-type* of Manandhar (1994b: 83)

The distribution of the INFO-STRUC values in a sign is determined by the *Focus Inheritance Principle*, which enforces that in every phrase, the INFO-STRUC value of the mother subsumes the values of the INFO-STRUC of all of its daughters. The consequence of this principle is that if one daughter in a phrase is in the focus and the other one in the background, then the mother's INFO-STRUC value is the smallest common supertype of both, namely *info-type*.

There are two problematic aspects of such an architecture. Firstly, it leads to a proliferation of syntactic markup of non-syntactic properties, in particular once one considers the full range of information structural notions, such as focus and

<sup>2</sup>For some examples in the literature where this has been explored for word order phenomena, see for example Ambridge & Goldberg (2008), De Kuthy & Konietzko (2019) and Culicover & Winkler (2019).

#### Kordula De Kuthy

focus projection, multiple foci and the marking of other discourse functions such as topic. And secondly, the perspective of information structure as resulting from an independent interpretation process of syntactic markup does not support a view of syntax, information structure and intonation as directly interacting modules, a view that can be nicely implemented in a multi-layer framework such as HPSG. More common are thus approaches that encode the information structure as a separate layer, i.e., a feature with its own structural representation.

In the original setup of signs introduced in Pollard & Sag (1994), the feature CONTEXT is introduced as part of *local* objects as a place to encode information relating to the pragmatic context (and other pragmatic properties) of utterances. In Engdahl & Vallduví (1996) it is argued that it would be most natural to also represent information structural information as part of this CONTEXT feature. Engdahl & Vallduví (1996) thus introduce the feature INFO-STRUC as part of the CONTEXT and since they couch their approach in Vallduví's (1992) information packaging terms, INFO-STRUC is further divided into FOCUS and GROUND. All INFO-STRUC features take entire signs as their values. The complete specification is shown in (2).

(2) Information structure in Engdahl & Vallduví (1996: 56) *sign*


   

Another approach locating the representation of information structure within the CONTEXT feature is the one proposed by Paggio (2009) as part of a grammar of Danish. The INFO-STRUC features TOPIC, FOCUS and BG take as their values lists of indices. Since Paggio (2009) uses Minimal Recursion Semantics (MRS, Copestake et al. 2005) as the semantic representation framework,<sup>3</sup> these indices can be structure-shared with the argument indices of the semantic relations collected on the RELS list of the content of a sign. The basic setup is illustrated in (3).

(3) Information structure in Paggio (2009: 149): *sign* SYNSEM|LOCAL|CONTEXT|INFOSTR FOCUS *list-of-indices* TOPIC *list-of-indices* BG *list-of-indices*

<sup>3</sup>A detailed discussion of the properties and principles of MRS as implemented in HPSG can be found in Koenig & Richter (2021: Section 6.1), Chapter 22 of this volume.

#### 23 Information structure

Several approaches encode information structure as part of the CONTENT, such as Song (2017) and Song & Bender (2012). Since they also use MRS as the semantic representation language, they enrich the architecture of *mrs* structures. The information structure itself is encoded via a feature ICONS (individual constraints) that is introduced parallel to HCONS (handle constraints) as part of the CONTENT, as shown in (4). Song (2017) and Song & Bender (2012) use *diff-list* as values for the features RELS, HCONS and ICONS (expressed by the "!" at the beginning and the end of the list). This type of list includes an explicit pointer to the last element of the list.

(4) Information structure in Song & Bender (2012) and Song (2017: 116):

The type *info-str* used as the value for elements on the ICONS list is divided into an elaborate hierarchy with several subtypes, such as *semantic-focus, contrast-focus, focus-or-topic, non-focus*, etc. (cf. Song 2017: 114). The elements of type *info-str* on the ICONS list have two appropriate features CLAUSE and TARGET. TARGET is always structure-shared with the respective sign's ARG0 value, and the value of CLAUSE is always structure-shared with the INDEX value of the predicate that is the semantic head of the clause.

As pointed out by De Kuthy (2002), assuming that the information structure is part of *local* objects (which it is if it is part of the CONTEXT in HPSG as proposed by Engdahl & Vallduví 1996 or part of the CONTENT) is problematic in connection with a trace-based account of unbounded dependency constructions (UDCs). Traces should not contribute anything to the information structure of a sentence. If one wants to develop an information structure approach which is independent of the decision of which kind of UDC theory one assumes, the only options for placing the information structure attribute are under *synsem* objects or at the top level of signs.

Information structure as part of *synsem* objects would suggest that it plays a role in syntactic selection. This possibility is assumed in Bildhauer & Cook (2011),

#### Kordula De Kuthy

and they thus represent INFO-STRUC as a feature appropriate for *synsem* objects (their account will be discussed in more detail in Section 6). A third possibility is argued for in De Kuthy (2002) and Bildhauer (2008), namely that information structure should not be part of *synsem* objects. As a result, they encode information structure again as an additional feature of signs (similar to the approach by Manandhar 1994b discussed above), but it is argued that the appropriate values should be semantic representations. Using indices as the value of information structure-related features (as in the approaches by Paggio 2009, Song & Bender 2012 and Song 2017) is again problematic whenever two constituents share their index value, but only one of them is assigned a particular information structural function. For example, under the assumption that in a head-adjunct phrase the index is structure-shared between an intersective adjective and the nominal head (as in *red car*), there is no way to relate a particular information structure function (e. g., contrast) to the adjective alone (as in *RED car*).

In De Kuthy (2002), a tripartite partition of information structure into focus, topic and background is introduced. As to the question of what kinds of objects should be defined as the values of these features, De Kuthy proposes the values of the INFO-STRUC features to be chunks of semantic information. It is argued that the semantic representation proposed in Pollard & Sag (1994) is not appropriate for her purpose, because the semantic composition is not done in parallel with the syntactic build-up of a phrase. Instead, the Montague-style (cf. Dowty et al. 1981) semantic representation for HPSG proposed in Sailer (2000) is adopted, in which CONTENT values are regarded as representations of a symbolic language with a model-theoretic interpretation. As the semantic object language under CONTENT the language Ty2 (cf. Gallin 1975) of two-sorted type theory is chosen. The resulting feature architecture is shown in (5).

(5) The structure of INFO-STRUC in De Kuthy (2002: 165):


The information structure is encoded in the attribute INFO-STRUC appropriate for signs whose appropriate attributes are FOCUS and TOPIC, with lists of so-called meaningful expressions (semantic terms, cf. Sailer 2000) as values. These meaningful expressions (that are also used as the representation of logical forms as

23 Information structure

the CONT values) are lambda terms formulated in a predicate logic language as discussed in more detail in Section 3.2.2 in (12).

# **3 Information structure principles**

The approaches sketched above all assume that signs contain some kind of representation of information structure, with the consequence that they need to introduce principles that constrain the values of the information structural features. Most approaches thus formulate two types of principles as part of their grammar fragment: one set of principles at the lexical level tying information structure to word level properties such as accents, and another set of principles at the phrasal level determining the distribution of information structure values between mother and daughters in a phrase.

# **3.1 Instantiating information structure at the word level**

In the approach of Engdahl & Vallduví (1996), prosodic properties of English, in particular accent placement, are tied to specific information structural properties of words and phrases. At the word level, they introduce two principles that instantiate the information structure attributes FOCUS and LINK when the word has a particular accent. The two principles are shown in (6).

(6) Information structure of words (Engdahl & Vallduví 1996: 56): *word* ⇒ 1 PHON|ACCENT *A* INFO-STRUC|FOCUS 1 

*word* ⇒ 1 PHON|ACCENT *B* INFO-STRUC|GROUND|LINK 1 

Words with an A accent always contribute focal information, i.e., the entire sign is structure-shared with the INFO-STRUC|FOCUS value; words carrying a B accent contribute link information, i.e., the entire sign is structure-shared with the INFO-STRUC|GROUND|LINK value.<sup>4</sup>

A similar set of word level principles is introduced in the approach of De Kuthy (2002), where the information structure of utterances in German is also tied to words carrying particular accent patterns. The phonology of signs is altered as shown in Figure 7 to include an ACCENT attribute to encode whether a word receives an accent or not, and whether it is a rising or falling accent, should it receive one.

<sup>4</sup>The usage of the terms "A accent" and "B accent" goes back to Jackendoff (1972: 259).

#### Kordula De Kuthy

(7) Representing pitch accents and accent type hierarchy according to De Kuthy (2002: 166):

The information structure of words is defined through the principle shown in Figure 8 which assigns the semantic contribution of the word to the focus or topic specification in the information structure representation of that word, depending on the type of accent the word receives.

(8) Principle assigning information structure to words (De Kuthy 2002: 167):

*word* ⇒ PHON|ACCENT *falling-accent* SS|LOC|CONT|LF 1 INFO-STRUC " FOCUS 1 TOPIC hi # ∨ PHON|ACCENT *unaccented* INFO-STRUC " FOCUS hi TOPIC hi# ∨ …

 Here only two cases are spelled out, one for *falling-accent* signalling focus, and one for unaccented words not contributing anything to the information structure. Other possible cases could for example be a specific accent (like a fall-rise) signalling topic, i.e., a non-empty TOPIC list.

In the approach of Song (2017), lexical items are subtypes of four different *icons-lex-item* types, which specify whether lexical items can contribute any information structural information to the *icons* list, and if yes, how many items can do this. These four lexical subtypes are shown in (9).

(9) Lexical types specifying ICONS values (Song 2017: 137):

a. *no-icons-lex-item* MKG FC *na* TP *na* ICONS ! ! b. *basic-icons-lex-item* ICONS ! ! c. *one-icons-lex-item* ICONS ![ ]! d. *two-icons-lex-item* ICONS ![ ],[ ]! 

Lexical entries for elements that cannot be marked with respect to information structure are of type *no-icons-lex-items*, such as relative pronouns or expletives in English. Song (2017: 121) introduces the morphosyntactic feature MKG as part of the SYNSEM|LOCAL|CAT value for the specification of information structural prop-

#### 23 Information structure

erties of lexical items with the two appropriate features, FC (FoCus-marked) and TP (ToPic-marked). The appropriate value of these two features is *luk*, which is a supertype of *bool* (boolean) and *na* (not-applicable). Since lexical entries of type *no-icons-lex-items* never have any information structural marking, the values for the features FC and TP are specified as *na*. Nominal items, such as common nouns, proper nouns and pronouns, have lexical entries of type *basic-icons-lexitem*. These types of words can have an information structural marking, but do not have to. The two other lexical subtypes are used for verbs with one clausal argument (*one-icons-lex-item*) or two clausal arguments (*two-icons-lex-items*). The information structural contribution of these clausal arguments then has to be part of the verb's ICONS list. All other verbs are not required to have any elements on their ICONS list and can thus also be of type *basic-icons-lex-item*.

To capture further constraints on the information structure properties at the word level, such as accent patterns triggering focus or topic, lexical rules are formulated in Song (2017) that derive lexical entries with the respective specifications. One such set of lexical rules for A and B accents in English is discussed in Section 7.

# **3.2 Information structure principles at the phrasal level**

### **3.2.1 Information packaging (Engdahl & Vallduví 1996)**

One of the first approaches integrating an explicit representation of information structure into the HPSG architecture, Engdahl & Vallduví (1996) encode the information structure as part of the CONTEXT of signs with the help of an additional feature INFO-STRUC. As discussed above, at the lexical level the specification of these features can be triggered by phonetic properties, such as certain accents, for intonation languages like English. Phrasal signs must then satisfy the INFO-STRUC instantiation constraints in (10).<sup>5</sup>

(10) INFO-STRUC *instantiation principles for English:*

Either (i) if a DAUGHTER's INFO-STRUC is instantiated, then the mother inherits this instantiation (for narrow foci, links and tails),

or (ii) if the most oblique DAUGHTER's FOCUS is instantiated, then the FOCUS of the mother is the sign itself (wide focus).

An example including a wide VP focus licensed by the principle in (10) with the relevant INFO-STRUC values is shown in Figure 2.

<sup>5</sup>Engdahl and Vallduví's formulation of the principle is incompatible with the model theoretic view of HPSG in Pollard & Sag (1994). Feature structures are complete models of objects, thus there is no way in which a value can not be instantiated in a feature structure. Only descriptions of feature structures can be underspecified, but not the feature structures themselves. See also Richter (2021), Chapter 3 of this volume.

Kordula De Kuthy

Figure 2: An example for VP focus in Engdahl & Vallduví (1996: 59)

In this example, the rightmost NP daughter *the Delft China Set* carries an A accent. According to the principle in (6) shown earlier, the entire sign is thus structure-shared with the focus value (or, in Engdahl & Vallduví's terms, the focus value "is instantiated"). As a consequence, the second clause of the principle in (10) applies and the focus value of the VP mother is the sign itself, which is then inherited by the sentence. Several aspects of the licensing of the structure in Figure 2 are not properly spelled out in Engdahl & Vallduví's approach. For example, the analysis seems to presuppose a set of additional principles for focus inheritance in nominal phrases which do not straightforwardly follow from the principles formulated in (10).

### **3.2.2 Information structure as structured meanings (De Kuthy 2002)**

The so-called structured meaning approach to information structure (von Stechow 1981; Jacobs 1983; Krifka 1992) provides a compositional semantic mechanism based on separate representations of the semantic contribution of the focus and that of the background. De Kuthy (2002), De Kuthy & Meurers (2003) and Webelhuth (2007) worked out how such a structured meaning approach can be integrated into the HPSG architecture.

As discussed above, in De Kuthy (2002), the information structure is encoded in the attribute INFO-STRUC as part of signs and has the appropriate features FOCUS and TOPIC, with lists of so-called meaningful expressions as values. The background of a sentence in De Kuthy's approach is then defined to be that part of the logical form of the sentence which is neither in focus nor in topic. This

#### 23 Information structure

characterization of background closely resembles the definition of background employed by the *structured meaning* approach to focus (cf. Krifka 1992). The INFO-STRUC value of a simple sentence with the focus as indicated in (11) is thus structured as shown in (12).

(11) Peter Peter [[liest reads ein a BUCH]] . book

> 

(12) A sign representation including information structure (adapted from De Kuthy 2002: 163):

SS|LOC|CONT|LF ∃ [<sup>0</sup> () ∧ <sup>0</sup> (*,* )] INFO-STRUC " FOCUS ∃ [<sup>0</sup> () ∧ <sup>0</sup> (*,* )] TOPIC hi # 

The INFO-STRUC values of phrases are constrained by principles such as the one in (13). The original principle formulated in De Kuthy (2002: 169) only contains the first two disjuncts shown in (13). The third disjunct is added in De Kuthy & Meurers (2003). Sentences where the focus or the topic does not project represent the most basic case: only those words bearing an accent are in the topic or in the focus of an utterance.

(13) Principle 1: Extended focus projection principle (De Kuthy & Meurers 2003: 105):

*phrase* ⇒ INFO-STR|FOCUS <sup>1</sup> ⊕ collect-focus 2 HEAD-DTR|INFO-STR|FOCUS 1 NON-HEAD-DTRS 2 ∨ PHON|PHON-STR *list* ⊕ 2 SS|LOC CAT|HEAD *noun* ∨ *prep* CONT|LF 3 INFO-STR|FOCUS 3 any-dtr © « PHON|PHON-STR 2 SS|L|CONT|LF 4 INFO-STR|FOCUS 4 ª ® ® ¬ ∨ SYNSEM|LOC CAT|HEAD *verb* CONT|LF 3 INFO-STR|FOCUS 3 NON-HEAD-DTRS \* …, SYNSEM FPP + LOC|CONT|LF 4 INFO-STR|FOCUS 4 , …+ ∨ …

#### Kordula De Kuthy

In this case, the mother of a phrase just collects the focus values of all her daughters as ensured by the first disjunct of the principle in (13).<sup>6</sup> The relation collectfocus ensures that from the list of non-head daughters, the FOCUS value of every non-head daughter is added to the list of FOCUS values of the entire phrase. A similar principle is needed to determine the TOPIC value of phrases.

For cases of so-called focus projection<sup>7</sup> in NPs and PPs, it is assumed in De Kuthy (2002: 169) that it is sufficient to express that the entire NP (or PP) can be focused if the rightmost constituent in that NP (or PP) is focused, as expressed by the second disjunct of the principle in (13). If focus projection is possible in a certain configuration then this is always optional, therefore the focus projection principle for nouns and prepositions is formulated as a disjunct. The second disjunct of the principle in (13) ensures that a phrase headed by a noun or a preposition can only be in the focus (i.e., its entire logical form is token identical to its FOCUS value) if the daughter that contributes the rightmost part of the phonology of the phrase is itself entirely focused. The relation *any-dtr* is a description of a sign with a head daughter or a list of non-head daughters and thereby ensures that it can be either the head (i.e., head daughter) of the phrase itself, or any non-head daughter that meets the condition of being focused. Again, a similar principle needs to be provided for the TOPIC value of nominal and prepositional phrases.

For the verbal domain, the regularities are known to be influenced by a variety of factors, such as the word order and lexical properties of the verbal head (cf., e.g., von Stechow & Uhmann 1986). Since verbs need to be able to lexically mark which of their arguments can project focus when they are accented, De Kuthy & Meurers (2003) introduce the boolean-valued feature FOCUS-PROJECTION-POTEN-TIAL (FPP) for objects of type *synsem*. (14) shows the relevant part of the lexical entry of the verb *lieben* 'love' which allows projection from the object but not from the subject:

<sup>6</sup>The presentation differs from that in De Kuthy (2002); it is the one from De Kuthy & Meurers (2003). Definitions of the auxiliary relations:

any-dtr (1):= - HEAD-DTR 1 .

any-dtr (1):= - NON-HEAD-DTRS element( 1) .

collect-focus(hi):= hi.

collect-focus -INFO-STRUC|FOCUS 1 | 2 := 1 | collect-focus( 2 ) .

<sup>7</sup>Focus projection is a term commonly used to describe the fact that in an utterance with prosodic marking of focus on a word, this marking can lead to ambiguity, in that different constituents containing the word can be interpreted as focused (cf. Gussenhoven 1983; Selkirk 1995).

#### 23 Information structure

(14) The focus projection potential of *lieben* (De Kuthy & Meurers 2003: 105): PHON|PHON-STR *lieben* 

$$\left\lfloor \begin{array}{c} \cdot\\ \cdot \text{ARG-ST} \left\langle \begin{bmatrix} \text{LOC}|\text{CAT}|\text{HEAD} \\ \text{CASE} \, \text{non} \end{bmatrix} \begin{bmatrix} \text{nom} \\ \text{CASE} \, \text{non} \end{bmatrix} \right\rangle, \begin{bmatrix} \text{LOC}|\text{CAT}|\text{HEAD} \\ \text{CASE} \, \text{acc} \end{bmatrix} \begin{bmatrix} \text{nom} \\ \text{CASE} \, \text{acc} \end{bmatrix} \right\rfloor \right\rfloor$$

The third disjunct of the principle in (13) then specifies under which circumstances focus can project in the verbal domain: a phrase headed by a verb can only be in focus (i.e., its entire logical form is token identical to an element of its FOCUS value) if the daughter that has the focus projection potential (FPP +) is itself entirely focused.

### **3.2.3 Information structure principles in MRS**

As introduced above, in the MRS based approach of Paggio (2009), the information structure is part of the CONTEXT, consisting of FOCUS, TOPIC and BACK-GROUND features which are structure-shared with the respective INDEX values of the semantic representation of a phrase. Paggio (2009) connects the distribution of information structure values to particular clausal types and introduces new phrasal subtypes which constrain the distribution of information structure in the respective phrases. One such new phrasal subtype is the type *focus-inheritance* as defined in (15), which then has to be cross-classified with every basic phrasal subtype (such as *hd-comp, hd-spec, hd-adj*, etc.) in order to constrain the distribution of focus values across all phrasal subtypes.

(15) Principle for focus inheritance (Paggio 2009: 155):

*focus-inheritance* ⇒ SYNSEM|LOC|CONTEXT " FOCUS 2 , 1 BG 3 # HD|SYNSEM|LOC|CONTEXT FOCUS 1 BG 3 NON-HD " SYNSEM|LOC|CONTEXT|FOCUS 2 ACCENT *true* # 

 The principle in (15) ensures that for signs of type *focus-inheritance*, the list of focus values of the mother is the list of focus values of the head daughter<sup>8</sup> plus

<sup>8</sup>This is not correctly specified in the original principle as formulated by Paggio (2009). If the head daughter can have a list with more than one element as its FOCUS value, then this entire list would have to be added to the list of FOCUS values of the mother, and not just one element of that list.

#### Kordula De Kuthy

the focus value of the non-head daughter, in case it is accented. Similar principles are defined for the inheritance of background values, also depending on the accent status of the non-head daughter. Paggio also assumes that each phrasal subtype has further subtypes connecting it to one of the information structure inheritance phrasal types. For example, she assumes that there is a phrasal subtype *focus-hd-adj* that is a subtype both of *hd-adj* and of *focus-inheritance*. Finally, clausal types are introduced that account for the information structure values at the top level of a clause. For example, the specification for *decl-main-all-focus* as shown in Figure 3 is a clause in which both the background and the topic values are empty and the mother collects the focus values from the head and the nonhead daughters.<sup>9</sup> In contrast to Paggio's approach, Song & Bender (2012) and

Figure 3: Declarative all-focus construction (Paggio 2009: 160)


Song (2017) locate the representation of information structure within the MRSbased CONTENT value of signs. The list elements of information structural values that are built up for a phrase consist of focus, background or topic elements coindexed with the semantic INDEX values of the daughters of that phrase. The main point of their approach is that they want to be able to represent underspecified information structural values, since very often a phrase, for example with a certain accent pattern, is ambiguous with respect to the context in which it can occur and thus is ambiguous with respect to its information structure values. An example they discuss is the one in (16), where the first sentence could be an answer to the question *What barks?* and thus signal narrow focus, whereas the second utterance could be an answer to the question *What happened?* and signal broad focus.

<sup>9</sup>Again, the list specifications as formulated by Paggio (2009) are not entirely correct: if the head daughter's FOCUS value 2 is a list with more than one element, the entire list has to be added to the list of FOCUS values of the mother. The order of 1 and 2 in the FOCUS list in Figure 3 seems to differ from what is stated in (14).

23 Information structure

	- b. [[The DOG barks]] .

The approach pursued in Song & Bender (2012) thus assumes that the two possible readings in (16) are further specializations of one MRS which is associated with one syntactic structure and includes underspecified values, in particular the type of the ICONS element for the constituent *barks*, leaving it open whether that is part of the focus or not.

In Song (2017), this approach is further spelled out and lexical rules are added that allow transitive and ditransitive verbs to be a possible source for focus projection. In an example such as (17), Song (2017) assumes that focus can only project if the last argument is accented as in (17b) (here accent is shown on the noun *book* in small caps), but not if some other argument is accented, as in (17a), where the proper noun *Lee* is accented.

	- b. Kim sent Lee the BOOK.

Accordingly, there are two lexical items for the verb *send*, which are derived by the lexical rules shown in (18).

(18) Focus projection lexical rules (Song 2017: 227):

$$\begin{array}{l} \text{a. } \textit{no-fours} \textit{crow-projection-rule} \Rightarrow \\ \begin{bmatrix} \text{INDX} & \box{\text{D}} \\ \text{ICONX} & \boxed{\text{Q}} \end{bmatrix} \\ \text{vAL} & \begin{bmatrix} \text{SUBJ} & \left\{ \begin{bmatrix} \text{ICONS} - \text{LEV} \ \textit{non-focus} \end{bmatrix} \right\} \\ \text{vAL} & \begin{bmatrix} \text{COMP} & \left\{ \begin{bmatrix} \text{IMCG} \ \text{FC} + \end{bmatrix}, \begin{bmatrix} \text{MKG} \ \text{FC} - \\ \text{ICONS} \ \text{I} \end{bmatrix} \right\} \end{bmatrix} \end{array} \right] \\ \begin{bmatrix} \text{c-contr} \, \text{I} \, \text{cons} \, \begin{bmatrix} \text{I} \, \boxed{\text{MKG}} \, \text{FC} + \text{I}}, \begin{bmatrix} \text{MKG} \ \text{FC} \end{bmatrix} \end{bmatrix} \end{array} \end{cases}$$

$$\begin{array}{l} \text{c-contr} \, \text{I} \, \text{cons} \, \begin{bmatrix} \text{I} \, \boxed{\text{D}} \, \text{KGP} \, \text{f} \, \text{I}} \end{bmatrix} \begin{bmatrix} \text{I} \\ \text{CLAUS-EEN} \end{bmatrix} \end{array}$$

$$\begin{array}{l} \text{c-Cotr} \, \text{I} \, \text{cons} \, \begin{bmatrix} \text{I} \, \text{MKG} \, \text{FC} + \\ \text{CLAUS-EEN} \quad \text{[} \, \begin{bmatrix} \text{MKG} \, \text{FC} + \\ \text{CAMCS} \quad \text{[} \, \boxed{\text{D}} \, \text{I} \end{bmatrix} \end{bmatrix} \end{array}$$

$$\begin{array}{l} \text{$$

#### Kordula De Kuthy

The lexical rule *no-focus-projection-rule* requires lexical items to have a nonfocus-marked element as the last element on the COMPS list, and in addition the word itself has an ICONS-KEY of type *non-focus* preventing the word itself from being focused. The lexical rule *focus-projection-rule* has a focus-marked element as the last element in the COMPS list. It is not further specified whether only that focused complement or also the word itself contributes anything to the ICONS value. In the example (17b), if the verb *sent* is licensed by the rule *focus-projectionrule*, either only *the book*, or the entire VP *sent Lee the book*, or even the entire sentence *Kim sent Lee the book* could be focused.

Since the approach of Song (2017) is part of a larger grammar fragment (the LinGO Grammar Matrix; Bender et al. 2010) with the aim of parsing and generating sentences from a large number of different languages, it contains a multitude of lexical and phrasal types and principles. Some of these specifications are introduced to capture very language-specific information structure properties (such as morphological markings, word order constraints, etc.), while others are necessary for the specific way in which grammar fragments in the LinGO Grammar Matrix are implemented and processed. It would be far beyond the scope of this article to discuss all these principles and specifications in detail and I therefore only included the most essential aspects of Song's approach in my discussion here.

# **4 Topics**

Most HPSG approaches are based on a focus/background division of the information structure of signs. To capture aspects of a topic vs. comment distinction, or to be able to specify topics as a special element in the background, they include an additional feature or substructure for topics. Engdahl & Vallduví (1996), for example, divide the GROUND into LINK and TAIL, where the link is a special element of the background linking it to the previous discourse, just like topics. In the approaches of De Kuthy (2002) and Paggio (2009), an additional feature TOPIC is introduced, parallel to FOCUS and BACKGROUND, in order to distinguish discourse referents as topics from the rest of the background.

Most approaches do not introduce separate mechanisms for the distribution of TOPIC values, but rather assume that similar principles as the ones introduced for focus can constrain topic values, as mentioned above for the approach of De Kuthy (2002). A more specific example can be found in Paggio (2009), where a constraint on topicalization constructions including a topic-comment partitioning is formulated, as illustrated in Figure 4. This *inv-topic-comment* phrasal type

#### 23 Information structure

$$
\begin{bmatrix}
in\nu\text{-topic-moment} \\
\text{cTxt}|\dots \\
\text{cTxt}|\dots \\
\text{cFocus} \begin{bmatrix}
\langle \Pi \rangle \\
\langle \Pi \rangle
\end{bmatrix}
\end{bmatrix}
$$

$$
\begin{bmatrix}
\langle \text{TOTC} \begin{bmatrix} \langle \Pi \rangle \\
\langle \Pi \rangle
\end{bmatrix}
\end{bmatrix}
$$

$$
\begin{bmatrix}
\text{CTxt}|\dots \\
\text{cTxt}|\dots \\
\text{gG} \qquad \langle \Pi \rangle
\end{bmatrix}
\qquad
\begin{bmatrix}
\text{TOTC} \begin{bmatrix} \langle \Pi \rangle \\
\langle \Pi \rangle
\end{bmatrix}
\end{bmatrix}
\qquad
\begin{bmatrix}
\text{Focus} \begin{bmatrix} \Box \\
\text{gG}
\end{bmatrix}
\end{bmatrix}
\qquad
$$

Figure 4: Topicalization construction with extracted topic (Paggio 2009: 160)

constrains the information structure values of topicalization constructions in Danish that involve subject verb inversion,<sup>10</sup> where the topic corresponds to the topicalized complement, as illustrated by the example in (19) from Paggio (2009: 142).

(19) og and [i in det the nederste lowest vindue] window [tager takes man one og and saetter puts urtepotten] flowerpot.DEF 'And in the lowest window you take and put the flowerpot.'

In Song (2017), a number of lexical and phrasal principles are provided with the purpose of licensing topic-comment structures. The principles and lexical entry in (20) are spelled out in order to license topic-comment constructions in Japanese which are characterized by the occurrence of the topic marker *wa* and a left dislocated topic phrase.

(20) Licensing topic-comment structures in Song (2017: 163, 199):


<sup>10</sup>Although Danish is generally considered to be a V2 language, where any kind of constituent (not only the subject) can occur in the position before the finite verb, Paggio (2009) seems to assume that clauses in which a dependent different from the subject, i.e., an object or some adjunct phrase, occurs before the finite verb have a different structure than those where the subject occurs in sentence-initial position.

Kordula De Kuthy

$$\begin{array}{ll} \text{b. } top-scr-comp-head \Rightarrow \begin{bmatrix} \text{HD} \text{[vAL/comps } \langle \rangle \end{bmatrix} \\ \text{NHD} \text{[ICONs-Key contrast-topic]} \\ \text{c. } \text{wa-marker} \Rightarrow \begin{bmatrix} \text{STEM} & \langle \text{wa} \rangle \\ \text{INCONS-Key} & \langle \mathbb{E} \rangle \\ \text{MKG} & tp \\ \text{COMPS} & \left\langle \begin{bmatrix} \text{INEM} \left\lbrack \mathbb{E} \right\rbrack \end{bmatrix} \right\rangle \\ \text{INCONS} & \left\langle ! \begin{bmatrix} \text{contrast-or-topic} \\ \text{TARGT} \left\lbrack \mathbb{E} \right\rbrack \end{bmatrix} \right\rangle . \end{array} \end{array}$$

 The constraint in (20a) on the phrasal subtype *topic-comment* ensures that only the non-head daughter is marked as a topic (according to Song (2017: 122), the type *tp* is a subtype of *mkg* and is constrained as [TP +]), whereas the head daughter functions as the comment (and presumably contains some focused material). The specification [L-PERIPH +] indicates that a constituent with this feature value cannot be combined with another constituent leftward.

A Japanese topic-comment structure, such as the one in (21) (Song 2017: 198), is licensed by the phrasal subtype *top-scr-comp-head*, i.e., it is assumed that the fronted complement, the *wa*-marked NP *sono hon wa* 'the book' is scrambled to the left peripheral position and is interpreted as a contrastive topic phrase.

(21) sono this hon book wa WA Kim Kim ga NOM yomu. read (Japanese) 'This book, Kim read.'

The topic marker *wa* in Japanese is treated as an adposition with the lexical specifications shown in (20c). The entire sentence is thus licensed as a head complement structure, where the object NP is scrambled to the sentence initial position and functions as a contrastive topic. The *tp* marking of the entire *topic-comment* phrase ensures that this phrase cannot be embedded as the comment in another *topic-comment* phrase.

# **5 Givenness**

In De Kuthy & Meurers (2011), it is shown how the HPSG approach to information structure of De Kuthy (2002) and colleagues can be extended to capture givenness and to make the right predictions for so-called *deaccenting*, which has been shown to be widespread (Büring 2006). In contrast to Schwarzschild (1999), who spells out his approach in the framework of alternative semantics (Rooth 1992), they show how the notion of givenness can be couched in a standard structured

#### 23 Information structure

meaning approach – thereby preserving the explicit, compositional representations of focus.

The example in (22) illustrates the necessity to include information about givenness into the information structural setup.

	- a. What did John rent?
	- b. He (only) rented [[a GREEN convertible]] .

The context in (22) introduces some conference participants, Bill, the rental of vehicles and red and blue convertibles into the discourse. Based on this context, when considering the question (22a) asking for the object that John is renting as the focus, one can answer this question with sentence (22b), where *a green convertible* is the focus: out of all the things John could have rented, he picked a green convertible. In this focus, only *green* is new to the discourse, whereas convertibles were already given in the context, and still the entire NP is in the focus.

To capture such cases of focus projection, an additional feature GIVEN is introduced to the setup of De Kuthy (2002), discussed in Section 3.2.2. The relation between pitch accents and the information structure of words is still defined by the principle shown in (23), depending on the type of accent the word receives.

(23) Relating intonation and information structure for words (De Kuthy & Meurers 2011: 294):


In addition, the Focus Projection Principle originally introduced in De Kuthy (2002) and then extended in De Kuthy & Meurers (2003) is extended with a disjunct capturing focus projection in the presence of givenness (De Kuthy & Meurers 2011). (24) shows the resulting principle.<sup>11</sup> The new fourth disjunct of the Ex-

<sup>11</sup>De Kuthy & Meurers (2011: 293) introduce the feature STRUCTURED-MEANING as appropriate for all signs, INFO-STRUC is changed to only be appropriate for unembedded-signs. An additional constraint ensures that the value of INFO-STRUC for unembedded signs is that composed in STRUCTURED-MEANING.

#### Kordula De Kuthy

tended Focus Projection Principle<sup>12</sup> captures the cases previously unaccounted for where given material in a focused phrase is deaccented. Focus in those examples can project from a focused daughter in a position which normally does not allow focus projection. This only is an option if all other daughters in that focused phrase are *given*. Spelling this out, the fourth disjunct of the principle in (24) specifies that the mother of a phrase can be in the focus (i.e., the entire LF value of the mother's CONTENT is token identical to an element on the mother's

$$\begin{aligned} \text{(i)} \quad & \texttt{dt}\,\texttt{rs}\,\texttt{-list}\,(\boxed{\Pi}\texttt{[2]}): \begin{bmatrix} \texttt{HEAD-DTR} & \boxed{\Pi} \\ \texttt{NN-HD-DTRS} & \boxed{\boxed{\Box}} \end{bmatrix} \\ & \texttt{givern-sign-l.test}: = \begin{Bmatrix} \ddots \\ \texttt{[ns]}\,\texttt{nl}\,\texttt{[conv]}\,\texttt{LF} & \boxed{\Pi} \\ \texttt{STRUC-MEANING} & \boxed{\text{GIVEN}\,\texttt{[\(\boxed{\Pi}\texttt{)}\}} \end{Bmatrix} \end{aligned}$$

<sup>12</sup>The auxiliary relations are defined as:

#### 23 Information structure

FOCUS list) if it is the case that the list of all daughters (provided by *dtrs-list*, a relational description of a list containing signs that are given) consists of *given* signs into which a single *focused* sign is shuffled ( ).13,14 As before, a sign is focused if its LF value is token identical to an element of its FOCUS value; and a sign is given if its LF value is token identical to an element of its GIVEN value.

The pitch accent in example (22b) is on the adjective *green* so that the principle in (8) on p. 1050 licenses structure sharing of the adjective's content with its FO-CUS value. In the context of the question (22a), the entire NP *a green convertible* from example (22b) is in focus. In the phrase *green convertible*, the clause licensing focus projection in NPs does not apply, since the adjective *green*, from which the focus has to project in this case, is not the rightmost element of the phrase. What does apply is the fourth disjunct of the principle licensing focus projection in connection with givenness. Since the noun *convertible* is given, the adjective *green* is the only daughter in the phrase that is not given and focus is allowed to project to the mother of the phrase. In the phrase *a green convertible*, focus projection is again licensed via the clause for focus projection in noun phrases, since the focused phrase *green convertible* is the rightmost daughter in that noun phrase.

# **6 Information structure and word order**

The explicit representation of information structure as part of signs in HPSG opens up the possibility of providing explanations for constraints previously stipulated in syntax, such as word order constraints, by deriving the constraints from the nature of the integration of a sentence into the discourse. Many of the approaches discussed in the previous section employ the information structural architecture exactly in this way and formulate principles linking word order to discourse properties.

One first such approach is presented in Engdahl & Vallduví (1996), where word order constraints for Catalan are couched into the information structure setup discussed in Section 3.2. The basic observation is that in Catalan, the word order within the sentential core is VOS and that every constituent within this sentential

<sup>13</sup>The relation "shuffle" is used as originally introduced in Reape (1994): the result is a list that contains all elements from the two input lists and the order of elements from the original lists is preserved, see the discussion in Müller (2021a: Section 6.1), Chapter 10 of this volume.

<sup>14</sup>If only binary structures are assumed, as in the examples in this chapter, the principle can be simplified. Here, I kept the general version with recursive relations following De Kuthy & Meurers (2003), which also supports flatter structures.

#### Kordula De Kuthy

core is interpreted as focal. If an argument of the main verb of a sentence is to be interpreted as non-focal, it must be clitic-dislocated. The example in (25) from Engdahl & Vallduví (1996) illustrates the two possible cases: the argument *a Barcelona* 'to Barcelona' can be topicalized as in (25b) or positioned at the end of the sentence as in (25c) in order to be interpreted as non-focal.



With respect to modeling this within their HPSG account, they assume that phrases associated with a LINK interpretation should be constrained to be left dislocated, whereas phrases associated with a TAIL interpretation should be right attached. They thus introduce the following ID schema for Catalan:

(26) *Head-Dislocation Schema for Catalan:* The DTRS value is an object of sort *head-disloc-struc* whose HEAD-DTR|SYN-SEM|LOCAL|CATEGORY value satisfies the description - HEAD *verb* - VFORM *finite* , SUBCAT , and whose DISLOC-DTRS|CON-TEXT|INFO-STRUC value is instantiated and for each DISLOC-DTR, the HEAD-

DTR|SYNSEM|LOCAL|CONTENT value contains an element which stands in a *binding* relation to that DISLOC-DTR.

The principle requires that the information structure value of dislocated daughters of a finite sentence has to be GROUND. An additional LP statement is then needed that captures the relation between the directionality of the dislocation and a further restriction of the GROUND value, as illustrated in (27).

(27) LP constraint on information structure in Catalan (adapted from Engdahl & Vallduví 1996: 65):

LINK *>* FOCUS *>* TAIL

Such an LP statement is meant to ensure that link material must precede focus material and focus material must precede tails. Thus, Engdahl & Vallduví (1996) ensure that left-dislocated constituents are always interpreted as links and rightdislocated constituents as tails.

The insights from Engdahl & Vallduví's approach are the basis for an approach to clitic left dislocation in Greek presented in Alexopoulou & Kolliakou (2002).

#### 23 Information structure

The representation of information structure with the features FOCUS and GROUND (further divided into LINK and TAIL) is taken over as well as the phonological constraints on words and the information structure instantiation principle. In order to account for clitic left dislocation, as illustrated in (28) from Alexopoulou & Kolliakou (2002: 196), an additional feature CLITIC is introduced as appropriate for *nonlocal* objects.

	- b. Tin the parastasi performance *ti* FEM.3SG.ACC skinothetise directed o the Karolos Karolos KOUN Koun … 'Karolos Koun directed the performance …'

The Linkhood Constraint shown in (29) ensures that links (i.e., elements whose INFO-STRUC|LINK value is instantiated) can only be fillers that are "duplicated" in the morphology by a pronominal affix, i.e., it is required that there is an element 1 on the CLITIC list of the head daughter that is structure-shared with the filler's HEAD value. The use of the disjoint union relation ] <sup>15</sup> ensures that the singleton element 1 representing the doubled clitic is the only element on the phrase's clitic list with these specifications. In addition, it is required that the filler-daughter 2 is structure-shared with the LINK attribute in the information structure of the mother.

(29) The Linkhood Constraint for clitic left dislocation phrases (Alexopoulou & Kolliakou 2002: 238):

 *clitic-left-disloc-phrase* INFO-STRUC|LINK 2 CLITIC Í 2 → 2 PHON|ACCENT *u* HEAD 1 , **H** *phrase* HEAD *verb* CLITIC 1 ] Í 2 

The Linkhood Constraint thus has two purposes: it ensures clitic doubling and it connects the particular word order of a left dislocated phrase to discourse properties by requiring the filler daughter to be the link of the entire clause. A related proposal for left dislocated elements in French can be found in Abeillé et al. (2008), where two types of sentences with preposed NPs are analyzed as headfiller clauses with additional constraints on the discourse properties of the respective filler daughters.

<sup>15</sup>Alexopoulou & Kolliakou (2002) provide no exact definition for the use of the symbol ] (disjoint union), but a definition that is often used within HPSG approaches can be found in Manandhar (1994a: 84).

#### Kordula De Kuthy

Other approaches dealing with left dislocated phrases are the ones proposed by De Kuthy (2002) and De Kuthy & Meurers (2003); the latter relates the occurrence of discontinuous NPs in German to specific information structural contexts, while De Kuthy & Meurers (2003) show that the realization of subjects as part of fronted non-finite constituents can be accounted for based on independent information structure conditions.

Based on the setup discussed in Section 3.2.2 above, constraints are formulated that restrict the occurrence of discontinuous NPs and fronted VPs based on their information structure properties. The type of discontinuous NPs at the center of De Kuthy's approach are so-called NP-PP split constructions, in which a PP occurs separate from its nominal head, as exemplified in (30).

	- b. [Ein a Buch] book hat has Max Max sich self über Syntax about syntax ausgeliehen. borrowed

The information structure properties of discontinuous noun phrases are summarized in De Kuthy (2002: 176) in the following principle:

In an utterance, in which a PP occurs separate from an NP, either the PP or the NP must be in the focus or in the topic of the utterance, but they cannot both be part of the topic or the same focus projection. (De Kuthy 2002: 176)

The last restriction can be formalized as: the PP's or NP's CONTENT values cannot be part of the same meaningful expression on the FOCUS list or the TOPIC list of the INFO-STRUC value of the utterance.

As discussed in De Kuthy & Meurers (2003), it has been observed that in German it is possible for unergative and unaccusative verbs to realize a subject as part of a fronted non-finite verbal constituent (Haider 1990). This is exemplified in (31) with examples from Haider (1990: 94):

	- b. [Haare hair.NOM wachsen] grow können can ihm him.DAT nicht not mehr. anymore 'His hair cannot grow anymore.'
	- c. [Ein an.NOM Außenseiter outsider gewonnen] won hat has hier hier noch still nie. never 'An outsider has still never won here.'

#### 23 Information structure

In order to account for the context-sensitive occurrence of such fronted verbal constituents, specific information structure properties of fronted verb phrases need to be expressed in a principle expressing what De Kuthy & Meurers refer to as Webelhuth's generalization (Webelhuth 1990: 53): in an utterance in which a verb phrase occurs as a fronted constituent (i.e., the filler of a head-filler phrase) this entire verb phrase must be in the focus of the utterance (i.e., the FOCUS value of the fronted constituent must be identical to its semantic representation). The formalization of this principle provided by (De Kuthy & Meurers 2003) is shown in (32).

(32) Webelhuth's generalization (De Kuthy & Meurers 2003: 106):

 *head-filler-phrase* NON-HEAD-DTR|SYNSEM|LOC|CAT|HEAD *verb* ⇒ INFO-STRUC|FOCUS 1 NON-HEAD-DTR " INFO-STRUC|FOCUS 1 SYNSEM|LOC|CONT|LF 1 # 

Combining the new lexical specifications, the focus projection rule for the verbal domain and the partial fronting focus requirement with the basic setup of De Kuthy (2002), one obtains a theory which predicts that subjects can only be part of a fronted verb phrase if they can be the focus exponent.<sup>16</sup> The sketch of an analysis for an example such as (31c) is illustrated in Figure 5. The entry of *gewinnen* 'to win' (the base form of the verb *gewonnen*) in (31c) in (33) encodes the lexical property that the subject of this intransitive verb has focus projection potential.

(33) The lexical item of *gewinnen* 'to win':


Under the assumption that in (31c) the noun *Außenseiter* 'outsider' carries a pitch accent, the information structure principle for words in (8) on p.1050 ensures that the noun contributes its LF value to its FOCUS value. The focus projection principle in (13) on p.1053 ensures that the focus can project over the entire NP *ein Außenseiter* 'an outsider', i.e., the NP's FOCUS element is identical to this

<sup>16</sup>Not every element in a syntactic phrase corresponding to the focus is prosodically prominent. Generally only one element is: the so-called *focus exponent* (cf. Selkirk 1995: 555).

Figure 5: Partial VP fronting in De Kuthy & Meurers (2003)

NP's LF value 3 . <sup>17</sup> Since *ein Außenseiter* 'an outsider' as the subject of *gewonnen* 'won' in the tree in Figure 5 is lexically marked as FPP +, the principle governing focus projection in the verbal domain in (13) licenses the focus to project over the entire fronted verb phrase *ein Außenseiter gewonnen* 'an outsider won'. The fronted constituent thus contributes its LF value to its FOCUS value. In this example, the focus does not project further, so that in the head-filler phrase the focus values of the two daughters are simply collected as licensed by the first disjunct of the focus principle in (13) discussed earlier in Section 3.2.2. As a result, the FOCUS value of the fronted verb phrase is the FOCUS value of the entire sentence. Finally, note that the example satisfies Webelhuth's generalization, which requires a fronted verb phrase to be the focus of the utterance as formalized in the principle in (32).

In the same spirit, Bildhauer & Cook (2010)show that sentences in which multiple elements have been fronted are directly linked to specific types of information structure. In German, a V2 language, normally exactly one constituent occurs in the position before the finite verb in declarative sentences. But so-called multiple fronting examples with more than one constituent occurring before the finite verb are well attested in naturally-occurring data (Müller 2003). Two examples from Bildhauer & Cook (2010) are shown in (34).<sup>18</sup>

<sup>17</sup>Note that the focus value of the entire NP is different from the focus value of just the noun *Außenseiter* 'outsider'. If the focus value of the noun was structure shared with the focus value of the entire NP, this would mean that there is only a narrow focus on the noun itself, excluding the determiner and possible modifiers.

<sup>18</sup>The examples are corpus examples that were extracted by Bildhauer & Cook (2010) from Deutsches Referenzkorpus (DeReKo), hosted at the Institut für Deutsche Sprache, Mannheim: http://www.ids-mannheim.de/kl/projekte/korpora, see also Bildhauer (2011).

23 Information structure

	- b. [Stets] always [einen a Lacher] laugh [auf on ihrer their Seite] side hatte had die the Bubi Bubi Ernesto Ernesto Family. Family

'Always good for a laugh was the Bubi Ernesto Family.'

As discussed by Bildhauer & Cook, such multiple fronting examples seem to require very special discourse conditions in order to be acceptable. Just like the fronted verb phrases discussed in De Kuthy & Meurers (2003) above, Bildhauer & Cook (2010) propose analyzing multiple fronting constructions in German as head-filler phrases which in this case introduce a topic shift. Following the approach by Müller (2005), multiple fronting configurations can be identified via the filler daughter which must have a HEAD|DSL (double slash) value of type *local*. <sup>19</sup> Bildhauer & Cook (2010) assume that an information structure attribute is specified within *synsem* objects, with the features FOCUS and TOPIC taking lists of *elementary predications* as their values. In general, multiple fronting *head-filler* phrases are restricted by the constraint in (35).

(35) Relating multiple fronting to focus (Bildhauer & Cook 2010: 75): *head-filler-phrase* NON-HEAD-DTRS -SYNSEM|LOC|CAT|HEAD|DSL *local* ⇒ - IS *pres* ∨ *a-top-com* ∨ *…* 

 *head-filler-phrase* IS *pres* ⇒ " SYNSEM|LOC|CAT|HEAD|DT -LOC|CONT|RELS 1 HD-DTR|SS|IS|FOCUS 1 #

The first constraint ensures that *head-filler* phrases that are instances of multiple frontings are restricted to have an IS-value of an appropriate type.<sup>20</sup> The second constraint then ensures that in presentational multiple frontings, the designated topic must be located in the head daughter (i.e., the verbal head of the *head-filler-*

<sup>19</sup>In Müller; Müller's (2005; 2021b) formalization, filler daughters in multiple fronting configurations (and only in these) have a HEAD|DSL value of type *local*, i.e., they contain information about an empty verbal head. The DSL ('double slash') feature is needed to model the HPSG equivalent of verb movement from the sentence-final position to initial position. See also Müller (2021a: Section 5.1), Chapter 10 of this volume.

<sup>20</sup>Bildhauer & Cook (2010: 75) assume that the type *is* as the appropriate value for IS has several subtypes specifying specific combinations of TOPIC and FOCUS values, such as *pres* for presentational focus or *a-top-com* for assessed topic-comment.

#### Kordula De Kuthy

*phrase*) and must be focused. The feature DT (designated topic) lexically specifies which daughter, if any, is normally realized as the topic of a particular verb. This constraint thus encodes what Bildhauer & Cook (2010) call "topic shift": the nonfronted element in a multiple fronting construction that would preferably be the topic is realized as a focus. A similar constraint is introduced for another instance of multiple frontings, which is called *propositional assessment* multiple fronting. Here it has to be ensured that the designated topic must be realized as the topic somewhere in the head daughter and the head daughter must also contain a focused element.

Webelhuth (2007) provides another account of the special information structural requirements of fronted constituents, in this case of predicate fronting in English that is based on the interaction of word order and information structural constraints.

(36) I was sure that Fido would bark and *bark he did*.

The principles that are part of Webelhuth's account require that in such cases of predicate fronting, the auxiliary is focused and the remainder of the sentence is in the background. The two principles needed for this interaction are shown in (37).

$$
\begin{bmatrix}
\text{(37)} & \text{Predicate preposing phrases (Webeluth 2007: 318):} \\
\begin{bmatrix}
a\text{ax-}\text{-}\text{-}\text{-}\text{} \\
\text{ARG-ST (NP, }\text{g}\text{ap-ss}\text{)}
\end{bmatrix} \Rightarrow \begin{bmatrix}
\text{ss}|\text{star}\text{trus }\text{f}\text{c} \\
\text{ARG-ST} \\
\text{RGR-ST} \quad
\left\langle \begin{bmatrix}
\text{s}\text{star}\text{trus }\text{bg}\end{bmatrix}, \text{gap-ss} \right\rangle
\end{bmatrix}
$$

$$
\begin{aligned}
\text{pred-}\text{-}\text{prev-}\text{-}\text{-}\text{-}\text{-}\text{-}\text{-}\text{-}\text{-}\text{-}\text{-}\text{-}\text{-}\text{-}\text{-}\text{-}\text{-}\text{-}\text{-}\text{-}\text{-}\text{-}\text{-}\text{-}\text{-}\text{-}\text{-}\text{-}\text{-}\text{-}\text{-}\text{-}\text{-}\text{-}\text{-}\text{-}\text{-}\text{-}\text{-}\text{-}\text{-}\text{-}\text{-}\text{-}\text{-}\text{-}\text{-}\text{-}\text{-}\text{-}\text{-}\text{-}\text{-}\text{-}\text{-}\text{-}\text{-}\text{-}\text{-}\text{-}\text{-}\text{-}\text{-}\text{-}\text{-}\text{-}\text{-}\text{-}\text{-}\text{-}\text{-}\text{-}\text{-}\text{-}\text{-}\text{-}\text{-}\text{-}\text{-}\text{-}\text{-}\text{-}\text{-}\text{-}\text{-}\text{-}\text{-}\text{-}\text{-}\text{-}\text{-}\text{-}\end{bmatrix}
\end{aligned}
$$

The first constraint ensures that auxiliary words whose predicate complement has the potential to be preposed (i.e., is of type *gap-ss*) have the information status *focus*, whereas the status of the first argument (the subject) is *background*. Additional constraints then ensure that auxiliary words with a gapped second argument can only occur in predicate preposing phrases, and vice versa, that predicate preposing phrases contain the right kind of auxiliary.

# **7 Information structure and prosody**

A lot of languages mark information structure prosodically, for example English and German, where pitch accents of various shapes are used to mark focus. Accordingly, several of the approaches discussed above include a component which

#### 23 Information structure

enriches the phonological representation of signs such that it allows the integration of the necessary prosodic aspects like accents.

Engdahl & Vallduví (1996) assume that signs can be marked for particular accents signaling focus or links in English, so-called A and B accents. In a similar way, De Kuthy (2002) extends the value of PHON such that it includes a feature ACCENT, in order to formulate constraints on the connection between accents and information structure markings. Most of approaches discussed above do not include a detailed analysis of the prosodic properties of the respective language that is being investigated with respect to discourse properties. As a result, most approaches do not go beyond the postulation of one or two particular accents, which are then somehow encoded as part of the PHON value. These accents more or less serve as an illustration of how lexical principles can be formulated within a particular theory that constrains the distribution of information structural values at the lexical level. The more articulate such a representation of PHON values including accent pattern, intonation contours, boundary tone, etc. is, the more detailed the principles could be that are needed to connect information structure to prosodic patterns in languages that signal discourse properties via intonation contours.

In Bildhauer (2008), one such detailed account of the prosodic properties of Spanish is developed together with a proposal for how to integrate prosodic aspects into the PHON value, also allowing a direct linking of the interaction of prosody and information structure. In his account, the representation of PHON values in HPSG is enriched to include four levels of prosodic constituency: phonological utterance, intonational phrases, phonological phrases and prosodic words. The lowest level, prosodic words of type *pwrd*, include the feature SEGS, which corresponds to the original PHON value assumed in HPSG, and additional features such as PA for pitch accents or BD for boundary tones, which encodes whether a boundary tone is realized on that word. The additional features UT (phonological utterance), IP (intonational phrase) and PHP (phonological phrase) encode via the type *epr* (edges and prominence) which role a prosodic word plays in higher level constituents. For example, the feature DTE (designated terminal element) specifies whether the word is the most prominent one in a phonological phrase. A sign's PHON list contains all *pwrd* objects, and relational constraints define the role each prosodic word plays in the higher prosodic constituents. This flat representation of prosodic constituency still makes it possible to express constraints about intonational contours associated with certain utterance types. One example discussed in Bildhauer's work is the contour associated with broad focus declaratives in Spanish, which can be decomposed into a sequence of late-rise

#### Kordula De Kuthy

(L\*H) prenuclear accents, followed by an early-rise nuclear accent (LH\*), followed by a low boundary tone (L%). The constraint introduced to model this contour for declarative utterances instantiates the BD value (boundary tone) of the last *pwrd* (prosodic word) in the PHON list to *low*, instantiates a nuclear pitch accent *low-high-star* on this rightmost prosodic word and ensures that a prenuclear pitch accent *low-star-high* is instantiated on every preceding compatible prosodic word. The constraint<sup>21</sup> is shown in (38).

(38) Intonational contour of Spanish declarative utterances (Bildhauer 2008: 142):

$$\begin{array}{c} \mathsf{dot\\_dec}\mathsf{l}\cdot\mathsf{tune}([\mathsf{II}]\leftrightarrow\mathsf{II}=\mathsf{[}\mathsf{D}=\mathsf{[}\mathsf{D}\oplus\left\{\begin{subarray}{c}\mathsf{PA\\_low\\_high\text{-}start\\\mathsf{BD}\text{ }low\end{subarray}}{\mathsf{bool}\text{ }low\end{subarray}}\right\}\wedge\\ [\mathsf{q}]=\mathsf{list}\left(\left[\begin{array}{c}\mathsf{BD}\text{ }none\text{]}\end{array}\right]\wedge\\ [\mathsf{q}]=\mathsf{list}\left(\left[\begin{array}{c}\mathsf{PA\\_none}\right]\end{array}\right)\bigcirc \mathsf{l}\text{list}\left(\left[\begin{array}{c}\mathsf{PA\\_low\text{-}start\text{-}high\text{}}\end{array}\right]\right) \end{array}$$

$$
\begin{bmatrix} \text{sign} \\ \text{EMBED} \end{bmatrix} \implies \begin{bmatrix} \text{PHON} \begin{bmatrix} \text{\Box} \end{bmatrix} \land \text{decl} \cdot \text{true} \begin{pmatrix} \text{\Box} \end{pmatrix} \end{bmatrix}
$$

The second constraint in (38) ensures that only unembedded utterances can be constrained to the declarative prosody described above. That this specific contour is then compatible with a broad focus reading is ensured by an additional principle expressing a general focus prominence constraint for Spanish, namely that focus prominence has to fall on the last prosodic word in the phonological focus domain, which, in the case of a broad focus, can be the entire utterance. The principle formulated in Bildhauer's account is shown in (39).

(39) Focus prominence in Spanish (Bildhauer 2008: 146): *sign* CONT 1 FOC 1 ⇒ - PHON *list* ⊕ -UT|DTE + 

Since only words that are the designated terminal element (DTE) can bear a pitch accent, the interplay of the two principles above ensures that in utterances with a declarative contour the entire phrase can be in the focus. These principles thus illustrate nicely not only how lexical elements can contribute to the information structure via their prosodic properties, but also how entire phrases with specific prosodic properties can be constrained to have specific information structural properties.

<sup>21</sup>Bildhauer (2008) uses the symbol ↔ for the definition of relational type constraints and the symbol ⇒ for other type constraints.

#### 23 Information structure

The approach of Song (2017) also includes a component that captures the interaction between prosodic properties of utterances and the effect of these properties on information structure. In order to include information structural constraints of the so-called A and B accents in English, several components of Bildhauer's (2008) phonological architecture are adapted to the information structural setup in Song (2017). Among them is the idea that in a phonological phrase (encoded in the phonological utterance feature UT), focus prominence is related to the most prominent word in that phrase, which is encoded via the constraint in (40).

(40) Prosodic marking of focus (Song 2017: 159):

$$\text{lex-rule} \Longrightarrow \begin{bmatrix} \text{UT} \mid \stackrel{\circ}{\text{DTE}} \boxed{\Box} \\ \text{MKG} \mid \text{FC} \mid \Box \end{bmatrix}$$

Specific lexical principles for the A and B accents then ensure the correct information structural marking and specify which type of element has to be present on the ICONS list. The specification necessary for English A accents that signal focus (here characterized as *high-star*) are shown in (41).<sup>22</sup>

(41) Focus marking of A accents in English (Song 2017: 160):

```
fc-lex-rule ⇒

UT|DTE +
PA high-star
MKG fc-only
INDEX 1
INCONS-KEY 2
C-CONT|ICONS 
              ! 2

                 semantic-focus
                 TARGET 1

                              !

DTR|HEAD +nv
```
# **8 Conclusion**

I have discussed various possibilities for how to represent information structure within HPSG's sign-based architecture. Several approaches from the HPSG literature were presented which all have in common that they introduce a separate feature INFO-STRUC into the HPSG setup, but they differ in (i) where they locate such a feature, (ii) what the appropriate values are for the representation of information structure and (iii) how they encode principles constraining the distribution and interaction of information structure with other levels of the grammatical

<sup>22</sup>The HEAD value *+nv* refers to a 'disjunctive head type for nouns and verbs' (Song 2017: 159).

Kordula De Kuthy

architecture. Finally, I discussed a number of theories in which phenomena such as word order are constrained to only be well-formed when they exhibit specific information structural properties.

# **Acknowledgments**

I would like to thank two anonymous reviewers, Stefan Müller and Jean-Pierre Koenig for their insightful comments on earlier drafts. They helped to improve the chapter a lot. All remaining errors or shortcomings are my own. Furthermore, I am grateful to Elizabeth Pankratz for her very thorough proofreading.

# **References**


#### 23 Information structure


#### Kordula De Kuthy


23 Information structure


#### Kordula De Kuthy


# **Part IV**

# **Other areas of linguistics**

# **Chapter 24**

# **Processing**

# Thomas Wasow

Stanford University

Although not much psycholinguistic research has been carried out in the framework of HPSG, the architecture of the theory fits well with what is known about human language processing. This chapter enumerates aspects of this fit. It then discusses two phenomena, island constraints and relative clauses, in which the fit between experimental evidence on processing and HPSG analyses seems particularly good.

# **1 Introduction**

Little psycholinguistic research has been guided by ideas from HPSG (but see Konieczny 1996 for a notable exception). This is not so much a reflection on HPSG as on the state of current knowledge of the relationship between language structure and the unconscious processes that underlie language production and comprehension. Other theories of grammar have likewise not figured prominently in theories of language processing, at least in recent decades.<sup>1</sup> The focus of this chapter, then, will be on how well the architecture of HPSG comports with available evidence about language production and comprehension.

My argument is much the same as that put forward by Sag et al. (2003: Chapter 9), and Sag & Wasow (2011; 2015), but with some additional observations about the relationship between competence and performance. I presuppose the "competence hypothesis" (see Chomsky 1965: Chapter 1), that is, that a theory

<sup>1</sup>Half a century ago, the Derivational Theory of Complexity (DTC) was an attempt to use psycholinguistic experiments to test aspects of the grammatical theory that was dominant at the time. The DTC was discredited in the 1970s, and the theory it purported to support has longsince been superseded. See Fodor et al. (1974) for discussion.

Thomas Wasow. 2021. Processing. In Stefan Müller, Anne Abeillé, Robert D. Borsley & Jean- Pierre Koenig (eds.), *Head-Driven Phrase Structure Grammar: The handbook*, 1081–1104. Berlin: Language Science Press. DOI: 10.5281/ zenodo.5599866

### Thomas Wasow

of language use (performance) should incorporate a grammar representing the knowledge of language (competence) that is drawn on in everyday comprehension and production, as well as in other linguistic activities, such as language games and the (often artificial) tasks employed in psycholinguistic experiments.

The primary reason for adopting the competence hypothesis is parsimony: a theory of language use is simpler if it does not have to repeat much the same information about the language in both its production and comprehension components. This information would include things like the vocabulary, the preferred word orders, and most of the rest of what linguists encode in their grammars. A performance theory that incorporates a grammar only needs to include such information once.<sup>2</sup> Moreover, to the extent that the theoretical constructs of the grammar play a role in modeling both production and comprehension, the overall theory is simpler.

There is also, however, an empirical reason for preferring a model with a good fit between competence and performance. As noted by Bresnan et al. (2001), preferences that are only statistical tendencies in some languages can show up in others as categorical requirements. The example they discuss in detail is the avoidance of clauses with third-person subjects but first- or second-person objects or obliques. In English, this is a powerful statistical tendency, which they document by showing that the passivization rate in the Switchboard corpus is very significantly lower when the agent is first- or second-person than when it is thirdperson. In Lummi (a Salish language of British Columbia), this preference is categorical: clauses with third-person subjects but first- or second-person objects or obliques are simply unacceptable. Hawkins (2004; 2014) argues that such examples are by no means exceptional, and formulates the following "Performance– Grammar Correspondence Hypothesis" (PGCH):

<sup>2</sup>There are of course some discrepancies between production and comprehension that need to be accounted for in a full theory of language use. For example, most people can understand some expressions that they never use, including such things as dialect-specific words or accents. But these discrepancies are on the margins of speakers' knowledge of their languages. The vast majority of the words and structures that speakers know are used in both production and comprehension. Further, it seems to be generally true that what speakers can produce is a proper subset of what they can comprehend. Hence, the discrepancies can plausibly be attributed to performance factors such as memory or motor habits. See Gollan et al. (2011) for evidence of differences between lexical access in production and comprehension. See Momma & Phillips (2018) for arguments that the structure-building mechanisms in production and comprehension are the same. For a thoughtful discussion of the relationship between production and comprehension, see MacDonald (2013) and the commentaries published with it.

### 24 Processing

Grammars have conventionalized syntactic structures in proportion to their degree of preference in performance, as evidenced by frequency of use and ease of processing.<sup>3</sup>

There are two ways in which a processing model incorporating a grammar might capture this generalization. One is to give up the widespread assumption that grammars provide categorical descriptions, and that any quantitative generalizations must be extra-grammatical; see Francis (2021) for arguments supporting this option, and thoughtful discussion of literature on how to differentiate processing effects from grammar. For example, some HPSG feature structure descriptions might allow multiple values for the same feature, but with probabilities (adding up to 1) attached to each value.<sup>4</sup> I hasten to add that fleshing out this idea into a full-fledged probabilistic version of HPSG would be a large undertaking, well beyond the scope of this chapter; see Linardaki (2006) and Miyao & Tsujii (2008) for work along these lines. But the idea is fairly straightforward, and would allow, for example, English to have *in its grammar* a non-categorical constraint against clauses with third-person subjects and first- or second-person objects or obliques.

The second way for a theory adopting the competence hypothesis to represent Hawkins's PGCH would be to allow certain generalizations to be stated either as grammatical constraints (when they are categorical) or as probabilistic performance constraints. This requires a fit between the grammar and the other components of the performance model that is close enough to permit what is essentially the same generalization to be expressed in the grammar or elsewhere. In the case discussed by Bresnan et al., for example, treating the constraint in question as part of the grammar of Lummi but a matter of performance in English would require that both the theory of grammar and models of production would include, minimally, the distinction between third-person and other per-

<sup>3</sup> In the Bresnan et al. example, I know of no experimental evidence that clauses with thirdperson subjects and first- or second-person objects are difficult to process. But a plausible case can be made that the high salience of speaker and addressee makes the pronouns referring to them more accessible in both production and comprehension than expressions referring to other entities. In any event, clauses with first- or second-person subjects and third-person objects are far more frequent than clauses with the reverse pattern in languages where this has been checked. Thus, the Bresnan et al. example falls under the PGCH, at least with respect to "frequency of use".

<sup>4</sup> I discussed this idea many times with the late Ivan Sag. He made it clear that he believed grammatical generalizations should be categorical. In part for that reason, this idea was not included in our joint publications on processing and HPSG.

### Thomas Wasow

sons, and the distinction between subjects and non-subjects. Since virtually all theories of grammar make these distinctions, this observation is not very useful in choosing among theories of grammar. I will return later to phenomena that bear on the choice among grammatical theories, at least if one accepts the competence hypothesis.

Since its earliest days, HPSG research has been motivated in part by considerations of computational tractability (see Flickinger, Pollard & Wasow 2021, Chapter 2 of this volume, for discussion). Some of the design features of the theory can be traced back to the need to build a system that could run on the computers of the 1980s. Despite the obvious differences between human and machine information processing, some aspects of HPSG's architecture that were initially motivated on computational grounds have turned out to fit well with what is known about human language processing. A prime example of that is the computational analogue to the competence hypothesis, namely the fact that the same grammar is used for parsing and generation. In Section 3, I will discuss a number of other high-level design properties of HPSG, arguing that they fit well with what is known about human language processing, which I summarize in Section 2. In Section 4, I will briefly discuss two phenomena that have been the locus of much discussion about the relationship between grammar and processing, namely island constraints and differences between subject and object relative clauses.

# **2 Key facts about human language processing**

In this section I review a number of well-known general properties of human language processing. Most of them seem evident from subjective experience of language use, but there is supporting experimental evidence for all of them.

# **2.1 Incrementality**

Both language production and comprehension proceed incrementally, from the beginning to the end of an utterance. In the case of production, this is evident from the fact that utterances unfold over time. Moreover, speakers very often begin their utterances without having fully planned them out, as is evident from the prevalence of disfluencies. On the comprehension side, there is considerable evidence that listeners (and readers) begin analyzing input right away, without waiting for utterances to be complete. A grammatical framework that assigns structure and meaning to initial substrings of sentences will fit more naturally than one that doesn't into a processing model that exhibits this incrementality we see in human language use.

### 24 Processing

I hasten to add that there is also good evidence that both production and comprehension involve anticipation of later parts of sentences. While speakers may not have their sentences fully planned before they begin speaking, some planning of downstream words must take place. This is perhaps most evident from instances of nouns exhibiting quirky cases determined by verbs that occur later in the clause. For example, objects of German *helfen*, 'help', take the dative case, rather than the default accusative for direct objects. But in a sentence like (1), the speaker must know that the verb will be one taking a dative object at the time the dative case article *dem* is uttered.

(1) Wir we werden will dem the.DAT Kind child bald soon helfen. help (German) 'We will help the child soon.'

Likewise, in comprehension there is ample evidence that listeners and readers anticipate what is to come. This has been demonstrated using a variety of experimental paradigms. Eye-tracking studies (see Tanenhaus et al. 1995, Altmann & Kamide 1999, Arnold et al. 2007, among many others) have shown that listeners use semantic information and world knowledge to predict what speakers will refer to next.

Thus, a theory of grammar that fits comfortably into a model of language use should provide representations of initial substrings of utterances that can be assigned (partial) meanings and be used in predicting later parts of those utterances.

# **2.2 Non-modularity**

Psycholinguistic research over the past four decades has established that language processing involves integrating a wide range of types of information on an as-needed basis. That is, the various components of the language faculty interact throughout their operation. A model of language use should therefore *not* be modular, in the sense of Jerry Fodor's influential 1983b book, *The Modularity of Mind*. 5

<sup>5</sup>Much of the psycholinguistic research of the 1980s was devoted to exploring modularity – that is, the idea that the human linguistic faculty consists of a number of distinct "informationally encapsulated" modules. While Fodor's book was mostly devoted to arguing for modularity at a higher level, where the linguistic faculty was one module, many researchers at the time extended the idea to the internal organization of the linguistic faculty, positing largely autonomous mechanisms for phonology, morphology, syntax, semantics, and pragmatics, with the operations of each of these sub-modules unaffected by the operations of the others. The outcome of years of experimental studies on the linguistic modularity idea was that it was aban-

### Thomas Wasow

Some casual observations argue against modular language processing. For example, the famously ambiguous sentences (2a) and (2b) can be disambiguated in speech by the stress patterns.

	- b. Dogs must be carried.

The two meanings of (2a) correspond to two different parses (one with *good* as part of the noun phrase *good beer* and the other with *how good* as a verb phrase modifier). The two meanings of (2b) have the same syntactic structure, but differ in whether the requirement is that all dogs be carried, or that everyone carry a dog. This interaction of prosody with syntax (in the case of (2a)) and with semantics (in the case of (2b)) is produced and perceived before the end of the utterance, suggesting that phonological information is available in the course of syntactic and semantic processing.

Moreover, non-linguistic knowledge influences the disambiguation in both of these cases. If (2a) is preceded by "I just finished three weeks without alcohol", the natural interpretation of *good* is as a modifier of *tastes*; but following "I just finished three weeks drinking only Bud Light", *good* is more naturally interpreted as a modifier of *beer*. In the case of (2b), only one interpretation (that anyone with a dog must carry it) is plausible, given our knowledge of the world. Indeed, most non-linguists fail to see the ambiguity of (2b) without a lengthy explanation.

More rigorous evidence of the non-modular character of language processing has been provided by a variety of types of experiments. The work of Michael Tanenhaus and his associates, using eye-tracking to investigate the time-course of sentence comprehension, played an important role in convincing most psycholinguists that human language understanding is non-modular. See, for example, Eberhard et al. (1995), McMurray et al. (2008), Tanenhaus et al. (1995), Tanenhaus et al. (1996), and Tanenhaus & Trueswell (1995). A recent survey of work arguing against modularity in language processing is provided by Spevack et al. (2018).

# **2.3 Importance of words**

The individual properties of words play a central role in how people process phrases and sentences. Consider, for example, what is probably the most famous sentence in psycholinguistics, (3), due originally to Bever (1970: 320).

doned by most psycholinguists. For an early direct response to Fodor, see Marslen-Wilson & Tyler (1987).

24 Processing

(3) The horse raced past the barn fell.

The extreme difficulty that people who have not previously been exposed to (3) have comprehending it depends heavily on the choice of words. A sentence like (4), with the same syntactic structure, is far easier to parse.

(4) The applicant interviewed in the morning left.

Numerous studies (e.g. Ford et al. 1982; Trueswell et al. 1993; MacDonald et al. 1994; Bresnan et al. 2007; Wasow et al. 2011) have shown that such properties of individual words as subcategorization preferences, semantic categories (e.g. animacy), and frequency of use can influence the processing of utterances.

# **2.4 Influence of context**

Much of the evidence against modularity of the language faculty is based on the influences of non-linguistic context and world knowledge on language processing. The well-known McGurk effect (McGurk & MacDonald 1976) and the Stroop effect (Stroop 1935) demonstrate that, even at the word level, visual context can influence linguistic comprehension and production.

Linguistic context also clearly influences processing, as the discussion of examples (2a) and (2b) above illustrates. The same conclusion is supported by numerous controlled studies, including, among many others, those described by Crain & Steedman (1985), Altmann & Steedman (1988), Branigan (2007), Traxler & Tooley (2007), Matsuki et al. (2011), and Spevack et al. (2018). The last of these references concludes (p. 11), "when humans and their brains are processing language with each other, there is no format of linguistic information (e.g., lexical, syntactic, semantic, and pragmatic) that cannot be rapidly influenced by context."

# **2.5 Speed and accuracy of processing**

A good deal of psycholinguistic literature is devoted to exploring situations in which language processing encounters difficulties, notably work on garden paths (in comprehension) and disfluencies (in production). Much more striking than the existence of these phenomena, however, is how little they matter in everyday language use. While ambiguities abound in normal sentences (see Wasow 2015 and also Bender & Emerson 2021: Section 2.2.2, Chapter 25 of this volume), comprehenders very rarely experience noticeable garden paths. Similarly, disfluencies in spontaneous speech occur in nearly every sentence but rarely disrupt communication.

Thomas Wasow

People are able to use speech to exchange information remarkably efficiently. A successful account of human language processing must explain why it works as well as it does.

# **3 Features of HPSG that fit well with processing facts**

In this section, I review some basic design features of HPSG, pointing out ways in which they comport well with the properties of language processing listed in the previous section.

# **3.1 Constraint-based**

Well-formedness of HPSG representations is defined by the simultaneous satisfaction of a set of constraints that constitutes the grammar (Richter 2021: 95, Chapter 3 of this volume). This lack of directionality allows the same grammar to be used in modeling production and comprehension.

Consider, for instance, the example of quirky case assignment illustrated in (1) above. A speaker uttering (1) would need to have planned to use the verb *helfen* before beginning to utter the object NP. But a listener hearing (1) would encounter the dative case on the article *dem* before hearing the verb and could infer only that a verb taking a dative object was likely to occur at the end of the clause. Hence, the partial mental representations built up by the two interlocutors during the course of the utterance would be quite different. But the grammatical mechanism licensing the combination of a dative object with this particular verb is the same for speaker and hearer.

In contrast, theories of grammar that utilize sequential operations to derive sentences impose a directionality on their grammars. If such a grammar is then to be employed as a component in a model of language use (as the competence hypothesis stipulates), its inherent directionality becomes part of the models of both production and comprehension. But production involves mapping meaning onto sound, whereas comprehension involves the reverse mapping. Hence, a directional grammar cannot fit the direction of processing for both production and comprehension.<sup>6</sup>

<sup>6</sup>This was an issue for early work in computational linguistics that built parsers based on the transformational grammars of the time, which generated sentences using derivations whose direction went from an underlying structure largely motivated by semantic considerations to the observable surface structure. See, for example, Hobbs & Grishman (1975).

### 24 Processing

Branigan & Pickering (2017) argue at length that "structural priming provides an implicit method of investigating linguistic representations".<sup>7</sup> They go on to conclude (p. 14) that the evidence from priming supports "frameworks that … assume nondirectional and constraint-based generative capacities (i.e., specifying well-formed structures) that do not involve movement".<sup>8</sup> HPSG is one of the frameworks they mention that fit this description.

# **3.2 Surface-oriented**

The features and values in HPSG representations are motivated by straightforwardly observable linguistic phenomena. HPSG does not posit derivations of observable properties from abstract underlying structures. In this sense it is surfaceoriented.

The evidence linguists use in formulating grammars consists of certain types of performance data, primarily judgments of acceptability and meaning. Accounts of the data necessarily involve some combination of grammatical and processing mechanisms. The closer the grammatical descriptions are to the observable phenomena, the less complex the processing component of the account needs to be.

For example, the grammatical theory of Kayne (1994), which posits a universal underlying order of specifier-head-complement, requires elaborate (and directional) transformational derivations to relate these underlying structures to the observable data in languages whose surface order is different (a majority of the languages of the world). In the absence of experimental evidence that the production and comprehension of sentences with different constituent orders involve mental operations corresponding to the grammatical derivations Kayne posits, his theory of grammar seems to be incompatible with the competence hypothesis.

Experimental evidence supports this reasoning. As Branigan & Pickering (2017: 9) conclude, "[P]riming evidence supports the existence of abstract syntactic representations. It also suggests that these are shallow and monostratal in a way that corresponds at least roughly to the assumptions of […] Pollard & Sag (1994) […]. It does not support a second, underlying level of syntactic structure or the

<sup>7</sup>Priming is the tendency for speakers to re-use linguistic elements that occurred earlier in the context; structural priming (which Branigan & Pickering sometimes call *abstract priming*) is priming of linguistic structures, abstracted from the particular lexical items in those structures. <sup>8</sup>Branigan & Pickering's conclusions are controversial, as is evident from the commentaries

accompanying their target article.

Thomas Wasow

syntactic representation of empty categories associated with the movement of constituents in some transformational analyses."

# **3.3 Informationally rich representations**

The feature structure descriptions of HPSG include all types of linguistic information relevant to the well-formedness and interpretation of expressions. This includes phonological, morphological, syntactic, semantic, and contextual information. They can also incorporate non-linguistic contextual information (e.g. social information), though this has not been extensively explored.

The cooccurrence of these different types of information within a single representation facilitates modeling production and comprehension processes that make reference to more than one of them. The architecture of the grammar is thus well suited to the non-modularity and context-sensitivity of language processing. It is interesting in this regard to consider the conclusions of two papers by psycholinguists who surveyed experimental evidence and inferred what types of grammatical information were essential for processing.

The following series of quotes captures the essence of what MacDonald et al. (1994) wrote regarding lexical representation, based on a survey of a wide range of psycholinguistic studies:


With the possible exception of "X-bar structure", this sounds very much like a description of the types of information included in HPSG feature structure descriptions.

Over twenty years later, Branigan & Pickering (2017) came to the following conclusions about linguistic representations, based on priming studies:

<sup>9</sup>A reviewer asked what feature of HPSG this maps into. The answer is straightforward: a word's phonological form, semantics, grammatical features, morphology, and argument structure are all represented together in one feature structure description, and the different pieces of the description may be linked through coindexing or tagging.

### 24 Processing


The two lists are quite different. This is in part because the focus of the earlier paper was on lexical representations, whereas the later paper was on linguistic representations more generally. It may also be attributable to the fact that MacDonald et al. framed their paper around the issue of ambiguity resolution, while Branigan & Pickering's paper concentrated on what could be learned from structural priming studies. Despite these differences, it is striking that the conclusions of both papers about the mental representations employed in language processing are very much like those arrived at by work in HPSG.

# **3.4 Lexicalism**

A great deal of the information used in licensing sentences in HPSG is stored in the lexical entries for words (see Müller & Wechsler 2014 and also Abeillé & Borsley 2021: Section 4, Chapter 1 of this volume). A hierarchy of lexical types permits commonalities to be factored out to minimize what has to be stipulated in individual entries, but the information in the types gets into the representations of phrases and sentences through the words that instantiate those types. Hence,

### Thomas Wasow

it is largely the information coming from the words that determines the wellformedness of larger expressions. Any lexical decomposition would have to be strongly motivated by the morphology.

Branigan & Pickering (2017: Section 2.3) note that grammatical structures (what some might call *constructions*) such as V-NP-NP can prime the use of the same abstract structure, even in the absence of lexical overlap. But they also note that the priming is consistently significantly stronger when the two instances share the same verb, a fact known as *the lexical boost*. They write, "To explain abstract priming, lexicalist theories must assume that the syntactic representations […] are shared across lexical entries." (p. 12) The types in HPSG's lexicon provide just such representations. Branigan & Pickering go on to say that the lexical boost argues for "a representation that encodes a binding between constituent structure and the lemma […] of the lexical entry for the head." In HPSG, this "binding" is simply the fact that the word providing the lexical boost (say, *give*) is an instantiation of a type specifying the structures it appears in (e.g. the ditransitive verb type, see also Yi, Koenig & Roland 2019).

Similarly, the fact, noted in Section 2.3 above, that a given structure may be more or less difficult to process depending on word choice is unsurprising in HPSG, so long as the processor has access to information about individual words and not just their types.

# **3.5 Underspecification**

HPSG allows a class of linguistic structures that share some feature values to be characterized by means of feature structure descriptions that specify only the features whose values are shared. Such underspecification is very useful for a model of processing (particularly a model of the comprehender) because it allows partial descriptions of the utterance to be built up, based on the information that has been encountered. This property of the grammar makes it easy to incorporate into an incremental processing model.

# **4 Two phenomena of interest**

# **4.1 Island constraints**

Ever since Ross's seminal dissertation (1967) introduced the notion of "island constraints", linguists have sought explanations for their existence, often suggesting that they were motivated by processing considerations (notably Grosu 1972; Fodor 1983a; Deane 1991). The basic idea is that island constraints restrict the search space the parser needs to consider in looking for a gap to match a

### 24 Processing

filler it has encountered, thereby facilitating processing. This then raises the question of whether island constraints need to be represented in grammar (language particular or universal), or can be attributed entirely to processing and/or other factors, such as pragmatics.

In principle, this question is orthogonal to the choice among theories of grammar. But in recent years, a controversy has arisen between some proponents of HPSG and certain transformational grammarians, with the former (e.g. Chaves 2012; 2021, Chapter 15 of this volume; Hofmeister & Sag 2010; Hofmeister, Jaeger, Arnon, Sag & Snider 2013; Chaves & Putnam (2020)) arguing that certain island phenomena should be attributed entirely to extra-grammatical factors, and the latter (e.g. Phillips 2013 and Sprouse et al. 2012) arguing that island constraints are part of grammar.

I will not try to settle this dispute here. Rather, my point in this subsection is to note that a theory in which there is a close fit between the grammar and processing mechanisms allows for the possibility that some island phenomena should be attributed to grammatical constraints, whereas others should be explained in terms of processing. Indeed, if the basic idea that islands facilitate processing is correct, it is possible that some languages, but not others, have grammaticalized some islands, but not others. That is, in a theory in which the grammar is a tightly integrated component of a processing model, the question of whether a particular island phenomenon is due to a grammatical constraint is an empirical one whose answer might differ from language to language.

Early work on islands (e.g. Ross 1967 and Chomsky 1973) assumed that, in the absence of negative evidence, island constraints could not be learned and hence must be innate and therefore universal. But cross-linguistic variation in island constraints, even between closely related languages, has been noted since the early days of research on the topic (see e.g. Erteschik-Shir 1973 and Engdahl & Ejerhed 1982).

This situation is what one might expect if languages differ with respect to the extent to which the processing factors that motivate islandhood have been grammaticalized. In short, a theory with a tight fit between its grammatical machinery and its processing mechanisms allows for hybrid accounts of islands that are not available to theories without such a fit.

One example of such a hybrid is Chaves's (2012) account of Ross's Coordinate Structure Constraint. Following much earlier work, Chaves distinguishes between the "conjunct constraint", which prohibits a gap from serving as a conjunct in a coordinate structure (as in \**What did you eat a sandwich and?*) and the "element constraint", which prohibits a gap from serving as an element of a larger conjunct (as in \**What did you eat a sandwich and a slice of?*). The conjunct con-

### Thomas Wasow

straint, he argues, follows from the architecture of HPSG and is therefore built into the grammar. The element constraint, on the other hand, has exceptions and, he claims, should be attributed to extra-grammatical factors. See Chaves (2021), Chapter 15 of this volume for a more detailed discussion of islands.

# **4.2 Subject vs. object relative clauses**

One of the most discussed phenomena in the literature on human sentence processing is the difference in processing complexity between relative clauses (RCs) in which the gap is the subject and those in which the gap is the object – or, as they are commonly called, "subject RCs" and "object RCs"; see, among many others, Wanner & Maratsos (1978), Gibson (1998), Traxler et al. (2002), and Gennari & MacDonald (2008). Relative clause processing complexity has been shown to be influenced by a number of factors other than the grammatical function of the gap, including the animacy and pronominality of the overt NP in the RC, as well as the frequency, animacy, and discourse properties of the head of the RC.<sup>10</sup> When these factors are controlled for, however, most psycholinguists accept that it has been established that subject RCs are generally easier to process than object RCs, at least in English.<sup>11</sup>

One approach to explaining this asymmetry has been based on the distance between the filler and the gap (see, among others, Wanner & Maratsos 1978; Gibson 1998; Hawkins 2004). In languages like English, with basic SVO clause order and RCs that follow the nouns they modify, the distance between the filler (the relativizer or head noun) and the gap is greater for an object gap than for a subject gap. If holding a filler in memory until the gap is encountered puts an extra

(i) Subject > Direct Object > Indirect Object > Oblique > Genitive > Object of Comparison

<sup>10</sup>The stimuli in the experimental studies on this topic always have RCs with one overt NP, either in subject or object position and a gap corresponding to the other grammatical function. In most of the studies, that NP is non-pronominal and animate. See Reali & Christiansen (2007) and Roland et al. (2012) for evidence of the role of these factors in processing complexity.

<sup>11</sup>This processing difference corresponds to the top end of the "accessibility hierarchy" that Keenan & Comrie (1977) proposed as a linguistic universal. Based on a diverse sample of 50 languages, they proposed the hierarchy below, and hypothesized that any language allowing RC gaps at any point in the hierarchy would allow RC gaps at all points higher (to the left) on the hierarchy.

Keenan & Comrie speculated that the generality of this hierarchy of relativizability lay in processing, specifically on the comprehension side. The extensive experimental evidence that has been adduced in support of this idea in the intervening decades has been concentrated on subject RCs vs. (direct) object RCs. The remainder of the hierarchy remains largely untested by psycholinguists.

### 24 Processing

burden on the processor, this could explain why object RCs are harder to process than subject RCs. This distance-based account makes an interesting prediction for languages with different word orders. In languages like Japanese with SOV order and RCs that precede the nouns they modify, the distance relationships are reversed – that is, the gaps in object RCs are closer to their fillers than those in subject RCs. The same is true of Chinese, with basic SVO order and RCs that precede the nouns they modify. So the prediction of distance-based accounts of the subject/object RC processing asymmetry is that it should be reversed in these languages.

The experimental evidence on this prediction is somewhat equivocal. While Hsiao & Gibson (2003) found a processing preference for object RCs over subject RCs in Chinese, their findings were challenged by Lin & Bever (2006) and Vasishth et al. (2013), who claimed that Chinese has a processing preference for subject RCs. In Japanese, Miyamoto & Nakamura (2003) found that subject RCs were processed more easily than object RCs. The issue remains controversial, but, for the most part, the evidence has not supported the idea that the processing preference between subject RCs and object RCs varies across languages with different word orders.

The most comprehensive treatment of English RCs in HPSG is Sag (1997). Based entirely on distributional evidence, Sag's analysis treats (finite) subject RCs as fundamentally different from RCs whose gap does not function as the subject of the RC. The difference is that the SLASH feature, which encodes information about long-distance dependencies in HPSG, plays no role in the analysis of subject RCs. Non-subject RCs, on the other hand, involve a non-empty SLASH value in the RC.<sup>12</sup>

Sag deals with a wide variety of kinds of RCs. From the perspective of the processing literature, the two crucial kinds are exemplified by (5a) and (5b), from Gibson (1998: 2).

	- b. The reporter who the Senator attacked admitted the error.

A well-controlled experiment on the processing complexity of subject and object RCs must have stimuli that are matched in every respect except the role of the gap in the RC. Thus, the conclusion that object RCs are harder to process than subject RCs is based on a wide variety of studies using stimuli like (5). Sag's

<sup>12</sup>The idea that at least some subject gaps differ in this fundamental way from non-subject gaps goes back to Gazdar (1981: 171–172).

#### Thomas Wasow

analysis of (5a) posits an empty SLASH value in the RC, whereas his analysis of (5b) posits a non-empty SLASH value.

There is considerable experimental evidence supporting the idea that unbounded dependencies – that is, what HPSG encodes with the SLASH feature – add to processing complexity; see, for example, Wanner & Maratsos (1978), King & Just (1991), Kluender & Kutas (1993), and Hawkins (1999). Combined with Sag's HPSG analysis of English RCs, this provides an explanation of the processing preference of subject RCs over object RCs. On such an account, the question of which other languages will exhibit the same preference boils down to the question of which other languages have the same difference in the grammar of subject and object RCs. At least for English, this is a particularly clear case in which the architecture of HPSG fits well with processing evidence.

# **5 Conclusion**

This chapter opened with the observation that HPSG has not served as the theoretical framework for much psycholinguistic research. The observations in Sections 2 through 4 argue for rectifying that situation. The fit between the architecture of HPSG and what is known about human sentence processing suggests that HPSG could be used to make processing predictions that could be tested in the lab.

To take one example, the explanation of the processing asymmetry between subject and object RCs offered above is based on a grammatical difference in the HPSG analysis: all else being equal, expressions with non-empty SLASH values are harder to process than those with empty SLASH values. Psycholinguists could test this idea by looking for other cases of phenomena that look superficially very similar but whose HPSG analyses differ with respect to whether SLASH is empty. One such case occurs with pairs like Chomsky's (1977: 103) famous minimal pair in (6).

	- b. Chris is easy to please.

Under the analysis of Pollard & Sag (1994: Section 4.3), *to please* in (6b) has a non-empty SLASH value but an empty SLASH value in (6a). Processing (6a) should therefore be easier. This prediction could be tested experimentally, and modern methods such as eye-tracking could pinpoint the locus of any difference in processing complexity to determine whether it corresponds to the region where the grammatical analysis involves a difference in SLASH values.

### 24 Processing

The current disconnect between theoretical investigations of language structure and psycholinguistic studies is an unfortunate feature of our discipline. Because HPSG comports so well with what is known about processing, it could serve as the basis for a reconnection between these two areas of study.

# **Acknowledgments**

A number of people made valuable suggestions on a preliminary outline and an earlier draft of this chapter, leading to improvements in content and presentation, as well as the inclusion of previously overlooked references. In particular, I received helpful comments from (in alphabetical order): Emily Bender, Bob Borsley, Rui Chaves, Danièle Godard, Jean-Pierre Koenig, and Stefan Müller. Grateful as I am for their advice, I take sole responsibility for any shortcomings in the chapter.

# **References**


### Thomas Wasow


24 Processing

in Natural Language Processing), 320–358. Cambridge, UK: Cambridge University Press. DOI: 10.1017/CBO9780511597855.


### Thomas Wasow


### 24 Processing


### Thomas Wasow


### 24 Processing


Thomas Wasow

*pected: Exceptions in grammar* (Trends in Linguistics: Studies and Monographs 216), 175–195. Berlin: De Gruyter Mouton. DOI: 10.1515/9783110219098.175.

Yi, Eunkyung, Jean-Pierre Koenig & Douglas Roland. 2019. Semantic similarity to high-frequency verbs affects syntactic frame selection. *Cognitive Linguistics* 30(3). 601–628. DOI: 10.1515/cog-2018-0029.

# **Chapter 25**

# **Computational linguistics and grammar engineering**

# Emily M. Bender University of Washington

# Guy Emerson

University of Cambridge

We discuss the relevance of HPSG for computational linguistics, and the relevance of computational linguistics for HPSG, including: the theoretical and computational infrastructure required to carry out computational studies with HPSG; computational resources developed within HPSG; how those resources are deployed, for both practical applications and linguistic research; and finally, a sampling of linguistic insights achieved through HPSG-based computational linguistic research.

# **1 Introduction**

From the inception of HPSG in the 1980s, there has been a close integration between theoretical and computational work (for an overview, see Flickinger, Pollard & Wasow 2021, Chapter 2 of this volume). In this chapter, we discuss computational work in HPSG, starting with the infrastructure that supports it (both theoretical and practical) in Section 2. Next we describe several existing largescale projects which build HPSG or HPSG-inspired grammars (see Section 3) and the deployment of such grammars in applications including both those within linguistic research and otherwise (see Section 4). Finally, we turn to linguistic insights gleaned from broad-coverage grammar development (see Section 5).

Emily M. Bender & Guy Emerson. 2021. Computational linguistics and grammar engineering. In Stefan Müller, Anne Abeillé, Robert D. Borsley & Jean- Pierre Koenig (eds.), *Head-Driven Phrase Structure Grammar: The handbook*, 1105–1153. Berlin: Language Science Press. DOI: 10 . 5281 / zenodo . 5599868

Emily M. Bender & Guy Emerson

# **2 Infrastructure**

# **2.1 Theoretical considerations**

There are several properties of HPSG as a theory that make it well-suited to computational implementation. First, the theory is kept separate from the formalism: the formalism is expressive enough to encode a wide variety of possible theories. While some theoretical work does argue for or against the necessity of particular formal devices (e.g., the shuffle operator; Reape 1994), much of it proceeds within shared assumptions about the formalism. This is in contrast to work in the context of the Minimalist Program (Chomsky 1995), where theoretical results are typically couched in terms of modifications to the formalism itself. From a computational point of view, the benefit of differentiating between theory and formalism is that the formalism is relatively stable. That enables the development and maintenance of software systems that target the formalism (Boguraev et al. 1988), such as software for parsing, generation, and grammar exploration (see Section 3 below for some examples).<sup>1</sup>

A second important property of HPSG that supports a strong connection between theoretical and computational work is an interest in both so-called "core" and so-called "peripheral" phenomena. Most implemented grammars are built with the goal of handling naturally occurring text.<sup>2</sup> This means that they will need to handle a wide variety of linguistic phenomena not always treated in theoretical syntactic work (Baldwin et al. 2005). A syntactic framework that discounts research on "peripheral" phenomena as uninteresting provides less support for implementational work than does one, like HPSG or Construction Grammar, that values such topics (for a comparison of HPSG and Construction Grammar, see Müller 2021b, Chapter 32 of this volume).

Finally, the type hierarchy characteristic of HPSG lends itself well to developing broad-coverage grammars which are maintainable over time (see Sygal & Wintner 2011). The use of the type hierarchy to manage complexity at scale comes out of the project at HP Labs where HPSG was originally developed (Flickinger et al. 1985; Flickinger 1987). The core idea is that any given constraint is (ideally) expressed only once, on a type which serves as a supertype to all enti-

<sup>1</sup>There are implementations of Minimalism, notably Stabler (1997) and Herring (2016). Most recently, Torr (2019) developed a broad-coverage, treebank-trained Minimalist parser. However, implementing a theory requires fixing the formalism, and so these implementations are unlikely to be useful for testing theoretical ideas if the formalism moves on. See Borsley & Müller (2021: Section 2.1), Chapter 28 of this volume for further discussion.

<sup>2</sup> It is possible, but less common, to do implementation work strictly against test suites of sentences constructed specifically to focus on phenomena of interest.

#### 25 Computational linguistics and grammar engineering

ties that bear that constraint.<sup>3</sup> Such constraints might represent broad generalizations that apply to many entities or relatively narrow, idiosyncratic properties that apply to only a few. By isolating any given constraint on one type (as opposed to repeating it in multiple places), we build grammars that are easier to update and adapt in light of new data that require refinements to constraints. Having a single locus for each constraint also makes the types a very useful target for documentation (Hashimoto et al. 2008) and grammar exploration (Letcher 2018).

# **2.2 Practical considerations**

HPSG allows practical implementations because it uses a well-defined formalism.<sup>4</sup> Furthermore, because HPSG is defined to be bi-directional, an implemented grammar can be used for both parsing and generation. In this section, we discuss how HPSG allows tractable algorithms, which enables linguists to empirically test hypotheses and which also enables HPSG grammars to be used in a range of applications, as we will see in Sections 4.1 and 4.2, respectively.

### **2.2.1 Computational complexity**

One way to measure how easy or difficult it is to use a syntactic theory in practical computational applications is to consider the *computational complexity*<sup>5</sup> of

<sup>3</sup>Originally this only applied to lexical entries in Flickinger's work. Now it also applies to phrase structure rules, lexical rules, and types below the level of the sign which are used in the definition of all of these. See Flickinger, Pollard & Wasow (2021: Section 6), Chapter 2 of this volume for further discussion.

<sup>4</sup>See Richter (2021), Chapter 3 of this volume for further discussion. To clarify a potentially confusing terminological point, much theoretical work in HPSG, including Pollard & Sag (1994), distinguishes between fully resolved feature structures and possibly underspecified feature structure descriptions. Much computational work, by contrast, operates entirely with partially specified feature structures, at both the level of grammar and the level of analyses licensed by the grammar. In keeping with this tradition, we use the term "feature structure" to refer to both fully specified and partially specified objects, and have no need for the term "feature structure description".

<sup>5</sup>Computational complexity is related to the complexity hierarchy of language classes in formal language theory. More complex language classes tend to require parsing and generation algorithms with higher computational complexity, as illustrated by the Chomsky Hierarchy (Chomsky 1963; Hopcroft & Ullman 1969) and the Weir Hierarchy (Weir 1992). However, this relationship is not exact. For example, the class of strictly local languages is a proper subset of the class of regular languages, but both classes can be parsed in linear time (Jäger & Rogers 2012). Similarly, there are proper supersets of the class of context-free languages which do not require additional computational complexity (Boullier 1999). Müller (2019: Chapter 17) discusses HPSG from the point of view of formal language theory.

#### Emily M. Bender & Guy Emerson

parsing and generation algorithms (Gazdar & Pullum 1985). Computational complexity includes both how much memory and how much computational time a parsing algorithm needs to process a particular sentence.<sup>6</sup> Considering parsing time, longer sentences will take longer to process, but the more complex the algorithm is, the more quickly the amount of processing time increases. Parsing complexity can thus be measured by considering sentences containing tokens, and then increasing to see how the amount of time changes. This can be done based on the average amount of time for sentences in a corpus (average-case complexity), or based on the longest amount of time for all theoretically possible sentences (worst-case complexity).

At first sight, analyzing computational complexity would seem to paint HPSG in a bad light, because the formalism allows us to write grammars which can be arbitrarily complex; in technical terminology, the formalism is *Turing-complete* (Johnson 1988: Section 3.4). However, as discussed in the previous section, there is a clear distinction between theory and formalism. Although the HPSG formalism rules out the possibility of efficient algorithms that could cope with any possible feature-structure grammar, a particular theory (or a particular grammar) might well allow efficient algorithms.

Keeping processing complexity manageable is handled differently in other computationally-friendly frameworks, such as Combinatory Categorial Grammar (CCG),<sup>7</sup> or Tree Adjoining Grammar (TAG; Joshi 1987; Schabes et al. 1988). The formalisms of CCG and TAG inherently limit computational complexity: for both of them, as the sentence length increases, worst-case parsing time is proportional to 6 (Kasami et al. 1989). This is a deliberate feature of these formalisms, which aim to be just expressive enough to capture human language, and not any more expressive. Building this kind of constraint into the formalism itself highlights a different school of thought from HPSG. Indeed, Müller (2015: 64) explicitly argues in favor of developing linguistic analyses first, and improving processing efficiency second. As discussed above in Section 2.1, separating the formalism from the theory means that the formalism is stable, even as the theory develops.

It would be beyond the scope of this chapter to give a full review of parsing algorithms, but it is instructive to give an example. For grammars that have a context-free backbone (every analysis can be expressed as a phrase-structure tree plus constraints between mother and daughter nodes), it is possible to adapt the

<sup>6</sup> In this section, we only consider parsing algorithms, but a similar analysis can be done for generation (e.g., Carroll et al. 1999).

<sup>7</sup>For an introduction, see Steedman & Baldridge (2011). For a comparison with HPSG, see Kubota (2021), Chapter 29 of this volume.

#### 25 Computational linguistics and grammar engineering

standard *parsing* algorithm (Kay 1973) for context-free grammars. The basic idea is to parse "bottom-up", starting by finding analyses for each token in the input, and then finding analyses for increasingly longer sequences of tokens (called *spans*), until the parser reaches the entire sentence.

For a context-free grammar, there is a finite number of nonterminal symbols, and each span is analyzed as a subset of the nonterminals. For a feature-structure grammar, each span must be analyzed as a set of feature structures, which makes the algorithm more complicated. In principle, a grammar may allow an infinite number of possible feature structures, for example if it includes recursive unary rules. However, if we can bound the number of possible feature structures as , then the worst-case parsing time is proportional to 2 +1 , where is the maximum number of children in a phrase-structure rule (Carroll 1993: Section 3.2.3). This is less complex than for an arbitrary grammar (which means that this class of grammars is *not* Turing-complete), but may nonetheless be very large.

But is the number of possible feature structures bounded in implemented HPSG grammars? For DELPH-IN grammars (see Section 3.2), the answer is yes. Assuming a system without relational constraints, the potential for unboundedness in the number of feature structures stems from the potential for recursion in feature paths: a list is a simple example,<sup>8</sup> and as another example, the elements on a COMPS list also include the feature COMPS.

However, in practice, such recursive paths do not need to be considered by the parsing algorithm. For example, selecting heads might place constraints on their complements' subjects (e.g., in raising/control constructions), but no further than that (e.g., a complement's complement's subject). Similarly, while lists that are potentially unbounded in length are used in semantic representations, these are never involved in constraining grammaticality. The only lists that constrain grammaticality are valence lists, but in practical grammars these are never greater than length four or five.<sup>9</sup>

When parsing real corpora, it turns out that the average-case complexity is much better than might be expected (Carroll 1994). On the one hand, grammat-

<sup>8</sup>More precisely, in the standard implementation of a list as a feature structure, the type *list* has two subtypes *null* and *non-empty-list*, and *non-empty-list* has the features FIRST and REST, where the value of REST is of type *list*. This means that the value of REST can itself have the feature REST. See also Richter (2021: 102), Chapter 3 of this volume on lists.

<sup>9</sup> In part, this is because DELPH-IN does not adopt proposals like the DEPS list of Bouma, Malouf & Sag (2001). Furthermore, in many DELPH-IN grammars, including the English Resource Grammar (ERG), the SLASH list cannot have more than one element. If unbounded valence lists or SLASH lists are required, such as to model cross-serial dependencies (Rentier 1994; see also Godard & Samvelian 2021, Chapter 11 of this volume), the number of possible structures might still be bounded as a function of sentence length; this would allow us to bound worst-case parsing complexity, but it will be a higher bound.

#### Emily M. Bender & Guy Emerson

ical constructions do not generally combine in the worst-case way, and on the other, when a grammar writer is confronted with multiple possible analyses for a particular construction, they may opt for the analysis that is more efficient for a particular parsing algorithm (Flickinger 2000). To measure the efficiency of grammars and parsing algorithms in practice, it can be helpful to use a test suite composed of a representative sample of sentences (Oepen & Flickinger 1998).

# **2.2.2 Parse ranking**

Various kinds of ambiguity are well-known in linguistics (such as modifier attachment and part-of-speech assignment), to the point that examples like (1) are stock in trade:

	- b. Visiting relatives can be annoying.

A well-constructed grammar should be expected to return multiple parses for each ambiguous sentence.

However, people are naturally very good at resolving ambiguity, which means most ambiguity is not apparent, even to linguists. It is only with the development of large-scale grammars that the sheer scale of ambiguity has become clear. For example, (2) might seem unambiguous, but there is a second reading, where *my favorite* is the topicalized object of *speak*, which would mean that town criers generally speak the speaker's favorite thing (perhaps a language) clearly. There is also a third, even more implausible reading, where *my favorite town* is the topicalized object. Such implausible readings don't easily come to mind, and in fact, the 2018 version of the English Resource Grammar (ERG; Flickinger 2000; 2011) gives a total of 21 readings for this sentence. With increasingly long sentences, such ambiguities stack up very quickly. For (3), the first line of a newspaper article,<sup>10</sup> the ERG gives 35,094 readings.


While exploring ambiguity can be interesting for a linguist, typical practical applications require just one parse per input sentence and specifically the parse

<sup>10</sup>https://www.theguardian.com/science/2018/aug/22/offspring-of-neanderthal-and-denisovanidentified-for-first-time, accessed 2019-08-16.

#### 25 Computational linguistics and grammar engineering

that best reflects the intended meaning (or only the top few parses, in case the one put forward as "best" is wrong). Thus, what is required is a *ranking* of the parses, so that the application can only use the most highly-ranked parse, or the top parses.

Parse ranking is not usually determined by the grammar itself, because of the difficulty of manually writing disambiguation rules.<sup>11</sup> Typically, a statistical system is used (Toutanova et al. 2002; 2005). First, a corpus is *treebanked*: for each sentence in the corpus, an annotator (often the grammar writer) chooses the best parse, out of all parses produced by the grammar. The set of all parses for a sentence is often referred to as the *parse forest*, and the selected best parse is often referred to as the *gold standard* or *gold parse*. Given the gold parses for the whole corpus, a statistical system is trained to predict the gold parse from a parse forest, based on many features<sup>12</sup> of the parse. From the example in (2), a number of different features all influence the preferred interpretation: the likelihood of a construction (such as topicalization), the likelihood of a valence frame (such as transitive *speak*), the likelihood of a collocation (such as *town crier*), the likelihood of a semantic relation (such as speaking a town), and so on.

Because of the large number of possible parses, it can be helpful to *prune* the search space: rather than ranking the full set of parses, ranking is restricted to a smaller set of parses. Carefully choosing how to restrict the parser's attention can drastically reduce processing time without hurting parsing accuracy, as long as the algorithm for selecting the subset includes the correct parse sufficiently frequently. One method, called *supertagging*, <sup>13</sup> exploits the fact that HPSG is a lexicalized theory: choosing the correct lexical entry for each token brings in rich information that can be exploited to rule out many possible parses. Thus if the correct lexical entry can be chosen prior to parsing (e.g., on the basis of the preceding and following words), the range of possible analyses the parser must consider is drastically reduced. Although there is a chance that the supertagger will predict the wrong lexical entry, using a supertagger can often improve parsing accuracy by ruling out parses that the parse-ranking model might incorrectly

<sup>11</sup>In fact, in earlier work, this task was undertaken by hand. One of the authors (Bender) had the job of maintaining rule weights in addition to developing the Jacy grammar (Siegel, Bender & Bond 2016) at YY Technologies in 2001–2002. No systematic methodology for determining appropriate weights was available and the system was both extremely brittle (sensitive to any changes in the grammar) and next to impossible to maintain.

<sup>12</sup>In the machine-learning sense of *feature*, not the feature-structure sense.

<sup>13</sup>The term *supertagging*, coined by Bangalore & Joshi (1999), refers to *part-of-speech tagging*, which predicts a part of speech for each input token, from a relatively small set of part-ofspeech tags. Supertagging is "super" in that it predicts detailed lexical entries, rather than simple parts of speech.

#### Emily M. Bender & Guy Emerson

rank too high. Supertagging was first applied to HPSG by Matsuzaki et al. (2007), building on previous work for TAG (Bangalore & Joshi 1999) and CCG (Clark & Curran 2004). To allow multiword expressions (such as *by and large*), where the grammar assigns a single lexical entry to multiple tokens, Dridan (2013) proposes an extension of supertagging, called *ubertagging*, which jointly predicts both a segmentation of the input and supertags for those segments. Dridan manages to increase parsing speed by a factor of four, while also improving parsing accuracy.

Finally, in order to train these statistical systems, we need to first annotate a treebank. When there are many parses for a sentence, it can be time-consuming to select the best one. To efficiently use an annotator's time, it can be helpful to use *discriminants*: properties which hold for some parses but not for others (Carter 1997). For example, discriminants might include whether to analyze an ambiguous token as a noun or a verb, or where to attach a prepositional phrase. This approach to treebanking also means that annotations can be re-used when the grammar is updated (Oepen et al. 2004; Flickinger et al. 2017). For more on treebanking, see Section 4.1.4.

### **2.2.3 Semantic dependencies**

In practical applications of HPSG grammars, the full phrase-structure trees and the full feature structures are often unwieldy, containing far more information than is necessary for the task at hand. It is therefore often desirable to extract a concise semantic representation.

In computational linguistics, a popular approach to semantics is to represent the meaning of a sentence as a *dependency graph*, as this enables the use of graph-based algorithms.<sup>14</sup> Several types of dependency graph have been proposed based on Minimal Recursion Semantics (MRS; Copestake et al. 2005), with varying levels of simplification. Oepen & Lønning (2006) observe that if every predicate has a unique *intrinsic argument*, an MRS can be converted to a variablefree semantic representation by replacing each reference to a variable with a reference to the corresponding predicate. They present Elementary Dependency Structures (EDS): semantic graphs which maintain predicate-argument structure but discard some scope information. (For many applications, scope information is less important than predicate-argument structure.) Copestake (2009) builds on this idea to create a more expressive graph-based representation called Dependency Minimal Recursion Semantics (DMRS), which is fully interconvertible

<sup>14</sup>In this section, we are concerned with *semantic* dependencies. For *syntactic* dependencies, see Hudson (2021), Chapter 31 of this volume. Some practical applications of HPSG use syntactic dependencies (including many applications of the Alpino grammar, discussed in Section 3.3.1).

#### 25 Computational linguistics and grammar engineering

with MRS.<sup>15</sup> This expressivity is achieved by adding annotations on the edges to indicate scope information. Finally, DELPH-IN MRS Dependencies (DM; Ivanova et al. 2012) express predicate-argument structure purely in terms of the surface tokens, without introducing any abstract predicates.

For example, the English Resource Grammar (ERG) produces the MRS representation in (4) for the sentence *The cherry tree blossomed*. For simplicity, we have omitted some details, including features such as NUMBER and TENSE, individual constraints (ICONS), and the use of difference lists. By convention, DELPH-IN predicates beginning with an underscore correspond to a lexical item, and have a three-part format, consisting of a lemma, a part-of-speech tag, and (optionally) a sense. Predicates without an initial underscore are abstract predicates. The *qeq* constraints (equality modulo quantifiers) are scopal relationships, where quantifiers may possibly intervene (for details, see Copestake et al. 2005 or Koenig & Richter 2021, Chapter 22 of this volume).

For readability, it can be easier to express an MRS in a more abstract mathematical form, as shown in (5). This is equivalent to the feature structure in (4).

<sup>15</sup>More precisely, for DMRS and MRS to be fully interconvertible, every predicate (except for quantifiers) must have an intrinsic argument, and every variable must be the intrinsic argument of exactly one predicate.

#### Emily M. Bender & Guy Emerson

(5) INDEX: <sup>1</sup> 1 : *\_the\_q* (<sup>1</sup> *, ℎ*1 *, ℎ*2) *, ℎ*<sup>1</sup> QEQ<sup>4</sup> <sup>2</sup> : *udef\_q* (2*, ℎ*3*, ℎ*4) *, ℎ*<sup>3</sup> QEQ<sup>3</sup> <sup>3</sup> : *\_cherry\_n\_1* (2) <sup>4</sup> : *\_tree\_n\_of* (1) *,compound* (2*,* <sup>1</sup> *,* 2) LTOP*,*<sup>5</sup> : *\_blossom\_v\_1* (<sup>1</sup> *,* 1)

The corresponding DMRS representation is shown in (6). This captures all of the information in the MRS in (5). Predicates are represented as nodes, while semantic roles and scopal constraints are represented as directed edges, called *dependencies* or *links*. Each dependency has two labels. The first is an argument label, such as ARG1, ARG2, or RSTR (the restriction of a quantifier). The second is a scopal constraint, such as QEQ, <sup>16</sup> EQ (the linked nodes share a label in the MRS, which is generally true for modifiers), or NEQ (the linked nodes don't share a label).

$$\begin{array}{c} \text{ } \mathit{udef\\_q} \\ \text{ } \begin{array}{c} \text{ } \mathit{u\\_s\\_the\\_q} \\ \text{ } \mathit{u\\_s\\_the\\_t\\_the\\_t\\_the\\_t\\_the\\_t\\_the\\_t\\_the\\_t\\_the\\_t\\_the\\_t\\_the\\_t\\_the\\_t\\_the\\_t\\_the\\_t\\_the\\_t\\_the\\_t}{\text{1}\\_the\\_t\\_the\\_t\\_the\\_t\\_the\\_t}{\text{1}\\_the} \end{array} } \end{array} (6)$$

Finally, the corresponding DM representation is shown in (7). This is a simplified version of MRS, where all nodes are tokens in the sentence. Some abstract predicates are dropped (such as *udef\_q*), while others are converted to dependencies (such as *compound*). Some scopal information is dropped (such as EQ vs. NEQ). The label BV stands for the "bound variable" of a quantifier, equivalent to the RSTR/QEQ of DMRS.

$$\underbrace{\text{ev}}\_{\text{(7)}\quadthe}\qquad\brace{\text{ev}\quad\brace{\text{coMPoolND}\quadtree\text{ }\upharpoonright}\quad\text{\(\!\! \text{R}\text{\(}\!\text{I}\text{\(}\!\text{I}\text{\(}\!\text{I}\text{\(}\text{\(}\text{}\text{\(}\text{}}\text{\(}\text{}}\text{\(}\text{}}\text{))]}^{\text{BV}}\quad\text{\(\!\text{IV}\text{\(}\text{\(}\text{}}\text{\(}\text{}}\text{))]}{\text{I}\quad\text{\(}\text{\(}\text{}}\text{\(}\text{}}\text{)}$$

The existence of such dependency graph formalisms, as well as software packages to manipulate such graphs (e.g., Ivanova et al. 2012, Copestake et al. 2016, Hershcovich et al. 2019, or PyDelphin17), has made it easier to use HPSG grammars in a number of practical tasks, as we will discuss in Section 4.2.

<sup>16</sup>An alternative notation is to write /H instead of /QEQ.

<sup>17</sup>https://github.com/delph-in/pydelphin/, accessed 2019-08-16.

25 Computational linguistics and grammar engineering

# **3 Development of HPSG resources**

In this section we describe various projects that have developed computational resources on the basis of or inspired by HPSG. As we will discuss in Section 4 below, such resources can be used both in linguistic hypothesis testing as well as in various practical applications. The intended purpose of the resources influences the form that they take. The CoreGram Project (Section 3.1) and Babel (Section 3.3.3) primarily target linguistic hypothesis testing, the Alpino and Enju parsers (Section 3.3.1 and 3.3.2) primarily target practical applications, and the DELPH-IN Consortium (Section 3.2) attempts to balance these two goals.

# **3.1 CoreGram**

The CoreGram<sup>18</sup> Project aims to produce large-scale HPSG grammars, which share a common "core" grammar (Müller 2015). At the time of writing, large grammars have been produced for German (Müller 2007), Danish (Müller & Ørsnes 2015), Persian (Müller & Ghayoomi 2010), Maltese (Müller 2009), and Mandarin Chinese (Müller & Lipenkova 2013). Smaller grammars are also available for English, Yiddish, Spanish, French, and Hindi.

All grammars are implemented in the TRALE system (Meurers et al. 2002; Penn 2004), which accommodates a wide range of technical devices proposed in the literature, including phonologically empty elements, relational constraints, implications with complex antecedents, and cyclic feature structures. It also accomodates macros and an expressive morphological component. Melnik (2007) observes that, compared to other platforms like the LKB (see Section 3.2 below), this allows grammar engineers to directly implement a wider range of theoretical proposals.

An important part of CoreGram is the sharing of grammatical constraints across grammars. Some general constraints hold for all grammars, while others hold for a subset of the grammars, and some only hold for a single grammar. Müller (2015) describes this as a "bottom-up approach with cheating" (p. 43): the aim is to analyze each language on its own terms (hence "bottom-up"), but to re-use analyses from existing grammars if possible (hence "with cheating"). The use of a core set of constraints is motivated not just for practical reasons, but also for theoretical ones. By developing multiple grammars in parallel, analyses can be improved by cross-linguistic comparison. The constraints encoded in the core grammar can be seen as a hypothesis about the structure of human language, as we will discuss in Section 4.1.1.

<sup>18</sup>https://hpsg.hu-berlin.de/Projects/CoreGram.html, accessed 2021-06-11.

#### Emily M. Bender & Guy Emerson

CoreGram grammar development aims to incrementally increase coverage of each language. To measure progress, grammars are evaluated against test suites: collections of sentences each annotated with a grammaticality judgment (Oepen et al. 1998; Müller 2004b). This allows a grammarian to check for unexpected side effects when modifying a grammar and to avoid situations when implementing an analysis of one phenomenon would break the analysis of another phenomenon. This is particularly important when modifying a constraint that is used by several grammars. To help achieve these aims, grammar development is supported by a range of software tools, including the test suite tool [incr tsdb()] (Oepen 2001; see also Section 3.2), and the graphical debugging tool Kahina (Dellert et al. 2010; 2013).

# **3.2 The DELPH-IN Consortium**

The DELPH-IN<sup>19</sup> Consortium was established in 2001 to facilitate the development of large-scale, linguistically motivated HPSG grammars for multiple languages, in tandem with the software required for developing them and deploying them in practical applications. At the time when DELPH-IN was founded, the ERG (Flickinger 2000; 2011) had already been under development for eight years, and the Verb*mobil* project (Wahlster 2000) had also spurred the development of grammars for German (GG; Müller & Kasper 2000; Crysmann 2003) and Japanese (Jacy; Siegel, Bender & Bond 2016). Project DeepThought (Callmeier, Eisele, Schäfer & Siegel 2004) was exploring methodologies for combining deep and shallow processing in practical applications across multiple languages. This inspired the development of the LinGO Grammar Matrix (Bender, Flickinger & Oepen 2002), which began as a core grammar, consisting of constraints hypothesized to be cross-linguistically useful, abstracted out of the ERG with reference to Jacy and GG. The goal of the Grammar Matrix is to serve as a starting point for the development of new grammars, making it easy to reuse what has been learned in the development of existing grammars. In the years since, it has been extended to include "libraries" of analyses of cross-linguistically variable phenomena (e.g., Drellishak 2009; Bender et al. 2010).

DELPH-IN provides infrastructure (version control repositories, mailing lists, annual meetings) and an emphasis on open-source distribution of resources, both of which support the collaboration of a global network of researchers working on interoperable components. Such components include repositories of linguistic

<sup>19</sup>DELPH-IN stands for DEep Linguistic Processing in HPSG INitiative; see http://www.delph-in. net, accessed 2019-08-16.

#### 25 Computational linguistics and grammar engineering

knowledge, that is, both grammars and meta-grammars (including the Matrix and CLIMB, Fokkens 2014); processing engines that apply that knowledge for parsing and generation (discussed further below); software for supporting the development of grammar documentation (e.g., Hashimoto et al. 2008), software for creating treebanks (Oepen et al. 2004; Packard 2015; see also Section 4.1.4 below), parse ranking models trained on these treebanks (Toutanova et al. 2005; see also Section 2.2.2 above), and software for robust processing, i.e., using the knowledge encoded in the grammars to return analyses for sentences even if the grammar deems them ungrammatical (Zhang & Krieger 2011; Buys & Blunsom 2017; Chen et al. 2018).

A key accomplishment of the DELPH-IN Consortium is the standardization of a formalism for the declaration of grammars (Copestake 2002a), a formalism for the semantic representations (Copestake et al. 2005), and file formats for the storage and interchange of grammar outputs (e.g., parse forests, as well as the results of treebanking; Oepen 2001; Oepen et al. 2004). These standards facilitate the development of multiple different parsing and generation engines which can all process the same grammars, including, so far, the LKB (Copestake 2002b), PET (Callmeier 2000), ACE,<sup>20</sup> and Agree (Slayden 2012); of multiple software systems for processing bulk grammar output, like [incr tsdb()] (Oepen 2001), art,<sup>21</sup> and Py-Delphin22; and of multilingual downstream systems which can be adapted to additional languages by plugging in different grammars. These tools and standards have in turn helped support a thriving community of users who furthermore accumulate and share information about best practices. Melnik (2007: 234) credits this community and the tools it has developed as a key factor that makes grammar engineering with the DELPH-IN ecosystem more accessible to HPSG linguists, compared to other platforms like TRALE (see Section 3.1 above).

The DELPH-IN community maintains research interests in both linguistics and practical applications. The focus on linguistics means that DELPH-IN grammarians strive to create grammars which capture linguistic generalizations and model grammaticality. This, in turn, leads to grammars with lower ambiguity than one finds with treebank-trained grammars and, importantly, grammars which produce well-formed strings in generation. The focus on practical applications leads to several kinds of additional research goals. Practical applications require robust processing, which in turn requires methods for handling unknown words (e.g., Adolphs et al. 2008), methods for managing extra-grammatical mark-up in text

<sup>20</sup>http://sweaglesw.org/linguistics/ace/, accessed 2019-08-16.

<sup>21</sup>http://sweaglesw.org/linguistics/libtsdb/art.html, accessed 2019-08-16.

<sup>22</sup>https://github.com/delph-in/pydelphin/, accessed 2019-08-16.

#### Emily M. Bender & Guy Emerson

such as in Wikipedia pages (e.g., Flickinger et al. 2010), and strategies for processing inputs that are ungrammatical, at least according to the grammar (e.g., Zhang & Krieger 2011; see also Section 4.2.3). Processing large quantities of text motivates performance innovations, such as supertagging or ubertagging (e.g., Matsuzaki et al. 2007; Dridan 2013; see also Section 2.2.2) to speed up processing times. Naturally occurring text can include very long sentences which can run up against processing limits. Supertagging helps here, too, but other strategies include *sentence chunking*, which is the task of breaking a long sentence into smaller ones without loss of meaning (Muszyńska 2016). Working with realworld text (rather than curated test suites designed for linguistic research only) requires the integration of external components such as morphological analyzers (e.g., Marimon 2013) and named entity recognizers (e.g., Waldron et al. 2006; Schäfer et al. 2008). As described in Section 2.2.2, working with real-world applications requires parse ranking (e.g., Toutanova et al. 2005), and similarly ranking of generator outputs (known as *realization ranking*; e.g., Velldal 2009). Finally, research on embedding broad-coverage grammars in practical applications inspires work towards making sure that the semantic representations can serve as a suitable interface for external components (e.g., Flickinger et al. 2005). These efforts are also valuable from a strictly linguistic point of view, i.e., one not concerned with practical applications. First, the broader the coverage of a grammar, the more linguistic phenomena it can be used to explore. Second, external constraints on the form of semantic representations provide useful guide points in the development of semantic analyses.

# **3.3 Other HPSG and HPSG-inspired broad-coverage grammars**

### **3.3.1 Alpino**

Alpino<sup>23</sup> is a broad-coverage grammar of Dutch (Bouma, van Noord & Malouf 2001; van Noord & Malouf 2005; van Noord 2006). The main motivation is practical: to provide coverage and accuracy comparable to state-of-the-art parsers for English. Nonetheless, it also includes theoretically interesting analyses, such as for cross-serial dependencies (Bouma & van Noord 1998). In addition to using hand-written rules, lexical information (such as subcategorization frames) has also been extracted from two existing lexicons, Celex (Baayen et al. 1995) and Parole (Kruyt & Dutilh 1997).

Alpino produces syntactic dependency graphs, following the annotation format of the Spoken Dutch Corpus (Oostdijk 2000). These dependencies are con-

<sup>23</sup>http://www.let.rug.nl/vannoord/alp/Alpino/, accessed 2019-08-16.

#### 25 Computational linguistics and grammar engineering

structed directly in the feature-structure formalism, exploiting the fact that a feature structure can be formalized as a directed acyclic graph. Each lexical entry encodes a partial dependency graph, and these graphs are composed through phrase structure rules to give a dependency graph for a whole sentence.

Although these dependencies differ from the semantic dependencies discussed in Section 2.2.3, a common motivation is to make the representations easier to use in practical applications. To harmonize with other computational work on dependency parsing, Bouma & van Noord (2017) have also produced a mapping from this format to Universal Dependencies (UD; Nivre et al. 2016), as discussed in Section 4.1.4 below. Alpino uses a statistical model trained on a dependency treebank, and in fact the same statistical model can be used in both parsing and generation (de Kok et al. 2011).

### **3.3.2 Enju**

Enju<sup>24</sup> (Miyao et al. 2005) is a broad-coverage grammar of English, semi-automatically acquired from the Penn Treebank (Marcus et al. 1993). This approach aims to reduce the cost of writing a grammar by leveraging existing resources. The basic idea is that, by viewing Penn Treebank trees as partial specifications of HPSG analyses, it is possible to infer lexical entries.

Miyao et al. converted the relatively flat trees in the Penn Treebank to binarybranching trees, and percolated head information through the trees. They also had to convert analyses for certain constructions, including subject-control verbs, auxiliary verbs, coordination, and extracted arguments. Each converted tree can then be combined with a small set of hand-written HPSG schemata, to induce a lexical entry for each word in the sentence.

The development of Enju has focused on performance in practical applications, and the grammar is supported by an efficient parser (Tsuruoka et al. 2004; Matsuzaki et al. 2007), using a probabilistic model for feature structures (Miyao & Tsujii 2008). Enju has been used in a variety of NLP tasks, as will be discussed in Section 4.2.2.

### **3.3.3 Babel**

Babel is a broad-coverage grammar of German (Müller 1996; 1999). One interesting feature of this grammar is that it makes extensive use of discontinuous constituents (Müller 2004a). Although this makes the worst-case parsing complexity

<sup>24</sup>http://www.nactem.ac.uk/enju/, accessed 29 August 2019.

#### Emily M. Bender & Guy Emerson

much worse, parsing speed doesn't seem to suffer in practice. This mirrors the findings of Carroll (1994), discussed in Section 2.2.1 above.

# **4 Deployment of HPSG resources**

There are several different ways in which computational resources based on HPSG are used. In Section 4.1, we first consider applications furthering linguistic research, including both language documentation and linguistic hypothesis testing. Then, in Section 4.2, we consider applications outside of linguistics.

# **4.1 Language documentation and linguistic hypothesis testing**

As described by Müller (1999: 439), Bender (2008), and Bender et al. (2011), grammar engineering — that is, the building of grammars in software — is an essential technique for testing linguistic hypotheses at scale. By "at scale", we mean both against large quantities of data and as integrated models of language that handle multiple phenomena at once. In this section, we review how this is done in the CoreGram and Grammar Matrix projects for cross-linguistic hypothesis testing, and in the AGGREGATION project in the context of language documentation.<sup>25</sup>

### **4.1.1 CoreGram**

As described in Section 3.1, the CoreGram project develops grammars for a diverse set of languages, and shares constraints across grammars in a bottom-up fashion, so that more similar languages share more constraints. There are constraints shared across all of the grammars in the project which can be seen as a hypothesis about properties shared by all languages. Whenever the CoreGram project expands to cover a new language, it can be seen as a test of this hypothesis.

For example, the most general constraint set allows a language to have V2 word order (as exemplified by Germanic languages), but rules out verb-penultimate word order, as discussed by Müller (2015: 45–46) (see also Müller 2021a, Chapter 10 of this volume on constituent order and Borsley & Crysmann 2021,

<sup>25</sup>Grammar engineering is not specific to HPSG and in fact has a history going back to at least the early 1960s (Kay 1963; Zwicky et al. 1965; Petrick 1965; Friedman et al. 1971) and modern work in grammar engineering includes work in many different frameworks, such as Lexical Functional Grammar (Butt et al. 1999), Combinatory Categorial Grammar (Baldridge et al. 2007), Grammatical Framework (Ranta 2009), and others. For reflections on grammar engineering for linguistic hypothesis testing in LFG, see Butt et al. (1999) and King (2016).

#### 25 Computational linguistics and grammar engineering

Chapter 13 of this volume on nonlocal dependencies). It also includes constraints for argument structure and linking (see Davis, Koenig & Wechsler 2021, Chapter 9 of this volume), as well as for information structure (see De Kuthy 2021, Chapter 23 of this volume).

### **4.1.2 Grammar Matrix**

As noted in Section 3.2, the LinGO Grammar Matrix (Bender et al. 2002; 2010) was initially developed in the context of Project DeepThought with the goal of speeding up the development of DELPH-IN-style grammars for additional languages. It consists of a shared core grammar and a series of "libraries" of analyses for crosslinguistically variable phenomena. Both of these constitute linguistic hypotheses: the constraints are hypothesized to be cross-linguistically useful. However, in the course of developing grammars based on the Matrix for specific languages, it is not uncommon to find reasons to refine the core grammar. The libraries, in turn, are intended to cover the attested range of variation for the phenomena they model. Languages that are not covered by the analyses in the libraries provide evidence that the libraries need to be extended or refined.

Grammar Matrix grammar development is less tightly coordinated than that of CoreGram (see Section 3.1): in the typical use case, grammar developers start from the Grammar Matrix, but with their own independent copy of the Matrix core grammar. This impedes somewhat the ability of the Matrix to adapt to the needs of various languages (unless grammar developers report back to the Matrix developers). On the other hand, the Matrix libraries represent an additional kind of linguistic hypothesis testing: each library on its own represents one linguistic phenomenon, but the libraries must be interoperable with each other. This is the cross-linguistic analogue of how monolingual implemented grammars allow linguists to ensure that analyses of different phenomena are interoperable (Müller 1999: 439–440; Bender 2008): the Grammar Matrix customization system allows its developers to test cross-linguistic libraries of analyses for interactions with other phenomena (Bender et al. 2011; Bender 2016). Without computational support (i.e., a computer keeping track of the constraints that make up each analysis, compiling them into specific grammars, and testing those grammars against test suites), this problem space would be too complex for exploration.

### **4.1.3 AGGREGATION**

In many ways, the most urgent need for computational support for linguistic hypothesis testing is the description of endangered languages. Implemented grammars can be used to process transcribed but unglossed text in order to find rel-

#### Emily M. Bender & Guy Emerson

evant examples more quickly, both of phenomena that have already been analyzed and of phenomena that are as yet not well-understood.<sup>26</sup> Furthermore, treebanks constructed from implemented grammars can be tremendously valuable additions to language documentation (see Section 4.1.4 below). However, the process of building an implemented grammar is time-consuming, even with the start provided by a multilingual grammar engineering project like CoreGram, ParGram (Butt et al. 2002; King et al. 2005), the GF Resource Grammar Library (Ranta 2009), or the Grammar Matrix.

This is the motivation for the AGGREGATION<sup>27</sup> project, which starts from two observations: (1) descriptive linguists produce extremely rich annotations on data in the form of interlinear glossed text (IGT); and (2) the Grammar Matrix's libraries are accessed through a customization system which elicits a grammar specification in the form of a series of choices describing either high-level typological properties or specific constraints on lexical classes and lexical rules. The goal of AGGREGATION is to automatically produce such grammar specifications on the basis of information encoded in IGT, to be used by the Grammar Matrix customization system to produce language-particular grammars. AGGRE-GATION uses different approaches for different linguistic subsystems. For example, it learns morphotactics by observing morpheme order in the training data, and how to group affixes together into position classes based on measures of overlap of stems they attach to (Wax 2014; Zamaraeva et al. 2017). For many kinds of syntactic information, it leverages syntactic structure projected from the translation line (English, easily parsed with current tools) through the gloss line (which facilitates aligning the language and translation lines) to the language line (Xia & Lewis 2007; Georgi 2016). Using this projected information, the AGGREGATION system can detect case frames for verbs, word order patterns, etc. (Bender et al. 2013; Zamaraeva et al. 2019).<sup>28</sup>

### **4.1.4 Treebanks and sembanks**

Annotated corpora are a particularly valuable type of resource that can be derived from HPSG grammars. Two important kinds are treebanks and sembanks. A *treebank* is a collection of text where each sentence is associated with a syn-

<sup>26</sup>This methodology of using an implemented grammar as a sieve to sift the interesting examples out of corpora is demonstrated for English by Baldwin et al. (2005).

<sup>27</sup>http://depts.washington.edu/uwcl/aggregation/, accessed 2019-08-16.

<sup>28</sup>The TypeGram project (Hellan & Beermann 2014) is in a similar spirit. TypeGram provides methods of creating HPSG grammars by encoding specifications of valence and inflection in particularly rich IGT and then creating grammars based on those specifications.

#### 25 Computational linguistics and grammar engineering

tactic representation. A *sembank* has semantic representations (in some cases in addition to the syntactic ones). Treebanks and sembanks can be used for linguistic research, as the analyses allow for more detailed structure-based searches for phenomena of interest (Rohde 2005; Ghodke & Bird 2010; Kouylekov & Oepen 2014).<sup>29</sup> In the context of language documentation and description, searchable treebanks can also be a valuable addition, helping readers connect prose descriptions of linguistic phenomena to multiple examples in the corpus (Bender et al. 2012). In natural language processing, treebanks and sembanks are critical source material for training parsers (see Sections 2.2.2 and 4.2.3).

Traditional treebanks are created by doing a certain amount of automatic processing on corpus data, including possibly chunking or context-free grammar parsing, and then hand-correcting the result (Marcus et al. 1993; Banarescu et al. 2013). While this approach is a means to encode human insight about linguistic structure for later automatic processing, it is both inefficient and potentially error-prone. The Alpino project (van der Beek et al. 2002; see also Section 3.3.1 above) addresses this by first parsing the text with a broad-coverage HPSG-inspired grammar of Dutch and then having annotators select among the parses. The selection process is facilitated by allowing the annotators to mark constituent boundaries and to mark lexical entries as correct, possibly correct, or wrong. These constraints reduce the search space for the parser and consequently also the range of analyses the annotator has to consider before choosing the best one. A facility for adding one-off lexical entries to handle misspellings, for example, helps increase grammar coverage. Disambiguation is handled with the aid of discriminants, as discussed in Section 2.2.2 above. Finally, the annotators may further edit analyses deemed insufficient. Though the underlying grammar is based on HPSG, the treebank stores dependency graphs instead. The Alpino parser was similarly used to construct the Lassy Treebanks of written Dutch (van Noord et al. 2013). In more recent work, these dependency representations have been mapped to the Universal Dependencies (UD) annotation standards (Nivre et al. 2016) to produce a UD treebank for Dutch (Bouma & van Noord 2017).

The Redwoods project (Oepen et al. 2004) also produces grammar-driven treebanks, in this case for English and without any post-editing of the selected analyses.<sup>30</sup> As with Alpino, this is done by first parsing the corpus with the grammar

<sup>29</sup>The WeSearch interface of Kouylekov & Oepen (2014) can be accessed at http://wesearch.delphin.net/deepbank/search.jsp (accessed 2019-08-16).

<sup>30</sup>There are also Redwoods-style treebanks for other languages, including the Hinoki Treebank of Japanese (Bond et al. 2004) and the Tibidabo Treebank of Spanish (Marimon 2015).

#### Emily M. Bender & Guy Emerson

and calculating the discriminants for each parse forest. After annotation, the treebanking software stores not only the final full HPSG analysis that was selected, but also the decisions the annotator made about each discriminant. Thus when the grammar is updated, for example to refine the semantic representations, the corpus can be reparsed and the decisions replayed, leaving only a small amount of further annotation work to be done to handle any additional ambiguity introduced by the grammar update. The activity of treebanking in turn provides useful insight into grammatical analyses, including sources of spurious ambiguity and phenomena that are not yet properly handled, and thus informs and spurs on further grammar development. A downside to strictly grammar-based treebanking is that only items for which the grammar finds a reasonable parse can be included in the treebank. For many applications, this is not a drawback, so long as there are sufficient and sufficiently varied sentences that do receive analyses.

Finally, there are also automatically annotated treebanks, which use a statistical parse-ranking model to select the best parse, instead of using a human annotator. These are not as reliable as manually annotated treebanks, but they can be considerably larger. WikiWoods<sup>31</sup> covers 55 million sentences of English (900 million tokens). It was produced by Flickinger et al. (2010) and Solberg (2012) from the July 2008 dump of the full English Wikipedia, using the ERG and PET, with parse ranking trained on the manually treebanked subcorpus WeScience (Ytrestøl et al. 2009). As with the Redwoods treebanks, WikiWoods is updated with each release of the ERG.

# **4.2 Downstream applications**

In this section, we discuss the use of HPSG grammars for practical tasks. There is a large number of applications, and we focus on several important ones here. In Section 4.2.1, we cover educational applications where a grammar is used directly. In Section 4.2.2, we cover cases where a grammar is used to provide features to help solve tasks in Natural Language Processing (NLP). Finally, in Section 4.2.3, we cover situations where a grammar is used to provide data for machine learning systems.<sup>32</sup>

### **4.2.1 Education**

Precise syntactic analyses can be useful in language teaching, in order to automatically identify errors and give feedback to the student. In order to model

<sup>31</sup>http://moin.delph-in.net/WikiWoods, accessed 2019-08-16.

<sup>32</sup>The DELPH-IN community maintains an updated list of applications of DELPH-IN software and resources at http://moin.delph-in.net/DelphinApplications (accessed 2019-08-16).

#### 25 Computational linguistics and grammar engineering

common mistakes, a grammar can be extended with so-called *mal-rules*. A malrule is like a normal rule, in that it licenses a construction, and can be treated the same during parsing. However, given a parse, the presence of a mal-rule indicates that the student needs to be given feedback (Bender et al. 2004; Flickinger & Yu 2013; Morgado da Costa et al. 2016). A large-scale system implementing this kind of computer-aided teaching has been developed by the Education Program for Gifted Youth at Stanford University, using the ERG (Suppes et al. 2014). This system has reached tens of thousands of elementary and middle school children, and has been found to improve the school results of underachieving children.

Another way to use an implemented grammar is to automatically produce teaching materials. Given a semantic representation, a grammar can generate one or more sentences. Flickinger (2017) uses the ERG to produce practice exercises for a student learning first-order logic. For each exercise, the student is presented with an English sentence and is supposed to write down the corresponding first-order logical form. By using a grammar, the system can produce syntactically varied questions and automatically evaluate the student's answer.

### **4.2.2 NLP tasks**

Much NLP work focuses on specific *tasks*, where a system is presented with some input and required to produce an output, with a clearly-defined metric to determine how well the system performs. HPSG grammars have been used in a range of such tasks, where the syntactic and semantic analyses provide useful features.

*Information retrieval* is the task of finding relevant documents for a given query. For example, Schäfer et al. (2011) present a tool for searching the ACL Anthology, using the ERG. *Information extraction* is the task of identifying useful facts in a collection of documents. For example, Reiplinger et al. (2012) aim to identify definitions of technical concepts from English text, in order to automatically construct a glossary. They find that using the ERG reduces noise in the candidate definitions. Miyao et al. (2008) aim to identify protein-protein interactions in the English biomedical literature, using Enju.

For these tasks, some linguistic phenomena are particularly important, such as negation and hedging (including adverbs like *possibly*, modals like *may*, and verbs of speculation like *suggest*). When it comes to identifying facts asserted in a document, a clause that has been negated or hedged should be treated with caution. MacKinlay et al. (2012) consider the biomedical domain, evaluating on the BioNLP 2009 Shared Task (Kim et al. 2009), where they outperform previous approaches for negation, but not for speculation. Velldal et al. (2012) consider negation and speculation in biomedical text, evaluating on the CoNLL 2010 Shared Task (Farkas et al. 2010), where they outperform previous approaches.

#### Emily M. Bender & Guy Emerson

Packard et al. (2014) propose a general-purpose method for finding the scope of negation in an MRS, evaluating on the \*SEM 2012 Shared Task (Morante & Blanco 2012). They find that transforming the output of the ERG with a relatively simple set of rules achieves high performance on this English dataset, and combining this approach with a purely statistical system outperforms previous approaches. Zamaraeva et al. (2018) use the ERG for negation detection and then use that information to refine the (machine-learning) features in a system that classifies English pathology reports, thereby improving system performance. A common finding from these studies is that a system using the output of the ERG tends to have high precision (items identified by the system tend to be correct) but low recall (items are often overlooked by the system). One reason for low recall is that the grammar does not cover all sentences in natural text. As we will see in Section 4.2.3, recent work on robust parsing may help to close this coverage gap.

Negation resolution is also included in Oepen et al.'s (2017) Shared Task on Extrinsic Parser Evaluation. As mentioned in Section 2.2.3, dependency graphs can provide a useful tool in NLP tasks, and this shared task aims to evaluate the use of dependency graphs (both semantic and syntactic) for three downstream applications: biomedical information extraction, negation resolution, and fine-grained opinion analysis. Some participating teams use DM dependencies (Schuster et al. 2017; Chen et al. 2017). The results of this shared task suggest that, compared to other dependency representations, DM is particularly useful for negation resolution.

Another task where dependency graphs have been used is *summarization*. Most existing work on this task focuses on so-called *extractive summarization*: given an input document, a system forms a summary by extracting short sections of the input. This is in contrast to *abstractive summarization*, where a system generates new text based on the input document. Extractive summarization is limited, but widely used because it is easier to implement. However, Fang et al. (2016) show how a wide-coverage grammar like the ERG makes it possible to implement an abstractive summarizer with state-of-the-art performance. After parsing the input document into logical propositions, the summarizer prunes the set of propositions using a cognitively inspired model. A summary is then generated based on the pruned set of propositions. Because no text is directly extracted from the input document, it is possible to generate a more concise summary.

Finally, no discussion of NLP tasks would be complete without including *machine translation*. A traditional grammar-based approach uses three grammars: a grammar for the source language, a grammar for the target language, and a *transfer grammar*, which converts semantic representations for the source language to semantic representations for the target language (Oepen et al. 2007; Bond et al. 2011). Translation proceeds in three steps: parse the source sentence, trans-

#### 25 Computational linguistics and grammar engineering

fer the semantic representation, and generate a target sentence. The transfer grammar is needed both to find appropriate lexical items and also to convert semantic representations when languages differ in how an idea might be expressed. The difficulty in writing a transfer grammar that is robust enough to deal with arbitrary input text means that statistical systems might be preferred. Horvat (2017) explores the use of statistical techniques, skipping over the transfer stage: a target-language sentence is generated directly from a semantic representation for the source language. Goodman (2018) explores the use of statistical techniques within the paradigm of parsing, transferring, and generating.

### **4.2.3 Data for machine learning**

In Section 4.2.2, we described how HPSG grammars can be directly incorporated into NLP systems. Another use of HPSG grammars in NLP is to generate data on which a statistical system can be trained.

For example, one limitation of using an HPSG grammar in an NLP system is that the grammar is unlikely to cover all sentences in the data (Flickinger et al. 2012). One way to overcome this coverage gap is to train a statistical system to produce the same output as the grammar. The idea is that the trained system will be able to generalize to sentences that the grammar does not cover. Oepen et al. (2014), Oepen et al. (2015), and Oepen et al. (2019) present shared tasks on semantic dependency parsing, including both DM dependencies and Enju predicate-argument structures. As of 2015, the best-performing systems in these shared tasks could already produce dependency graphs almost as accurately as grammar-based parsers (for sentences where the grammar has coverage). Similarly, Buys & Blunsom (2017) develop a parser for EDS and DMRS which performs almost as well as a grammar-based parser, but has full coverage, and can run 70 times faster.

In fact, in more recent work, the difference in performance has been effectively closed. Chen et al. (2018) consider parsing to EDS and DMRS graphs, and actually achieve slightly higher accuracy with their system, compared to a grammarbased parser. Unlike the previous statistical approaches, Chen et al. do not just train on the desired dependency graphs, but also use information in the phrasestructure trees. They suggest that using this information allows their system to learn compositional rules mirroring composition in the grammar, which thereby allows their system to generalize better.

Another application of HPSG-derived dependency graphs is for *distributional semantics*. Here, the aim is to learn the meanings of words from a corpus, exploiting the fact that the context of a word tells us something about its meaning. This is known as the *distributional hypothesis*, an idea with roots in American

#### Emily M. Bender & Guy Emerson

structuralism (Harris 1954) and British lexicology (Firth 1951; 1957). Most work on distributional semantics learns a *vector space model*, where the meaning of each word is represented as a point in a high-dimensional vector space (for an overview, see Erk 2012 and Clark 2015). However, Emerson (2018) argues that vector space models cannot capture various aspects of meaning, such as logical structure, and phenomena like polysemy. Instead, Emerson presents a distributional model which can learn truth-conditional semantics, using a parsed corpus like WikiWoods (see Section 4.1.4). This approach relies on the semantic analyses given by a grammar, as well as the infrastructure to parse a large amount of text.

Finally, there are also applications which use grammars not to parse, but to generate. Kuhnle & Copestake (2018) consider the task of *visual question answering*, where a system is presented with an image and a question about the image, and must answer the question. This task requires language understanding, reference resolution, and grounded reasoning, in a way that is relatively well-defined. However, for many existing datasets, there are biases in the questions which mean that high performance can be achieved without true language understanding. For this reason, there is increasing interest in artificial datasets, which are controlled to make sure that high performance requires true understanding. Kuhnle & Copestake present ShapeWorld, a configurable system for generating artificial data. The system generates an abstract representation of a scene (colored shapes in different configurations), and then generates an image and a caption based on this representation. The use of a broad-coverage grammar is crucial in allowing the system to be configurable and scale across a variety of syntactic constructions.

# **5 Linguistic insights**

In Section 4.1 above, we described multiple ways in which computational methods can be used in the service of linguistic research, especially in testing linguistic hypotheses. Here, we highlight a few ways in which grammar engineering work in HPSG has turned up linguistic insights that had not previously been discovered through non-computational means.<sup>33</sup>

# **5.1 Ambiguity**

As discussed in Section 2.2.2, the scale of ambiguity has become clear now that broad-coverage precision grammars are available. By taking both coverage and

<sup>33</sup>For similar reflections from the point of view of LFG, see King (2016).

#### 25 Computational linguistics and grammar engineering

precision seriously, it is possible to investigate it on a large scale, quantifying the sources of ambiguity and the information needed to resolve it. For example, Toutanova et al. (2002; 2005) found that in the Redwoods treebank (3rd Growth), roughly half of the ambiguity was lexical, and half syntactic. They also showed how combining sources of information (such as both semantic and syntactic information) is important for resolving ambiguity, and argue that using multiple kinds of information in this way is consistent with probabilistic approaches in psycholinguistics.

# **5.2 Long-tail phenomena**

One of the strengths of HPSG as a theoretical framework is that it allows for the analysis of both "core" and "peripheral" phenomena within a single, integrated model. Indeed, by treebanking large corpora, it becomes possible to investigate the extent to which a particular phenomenon could be considered "core" or "peripheral" within a language. Furthermore, by implementing large-scale grammars across a range of languages, it also becomes possible to investigate the extent to which a phenomenon could be considered "core" or "peripheral" across languages (Müller 2014).

In fact, when working with actual data and large-scale grammars, it quickly becomes apparent just how long the long-tail of "peripheral" phenomena is. Furthermore, the sustained development of broad-coverage linguistic resources makes it possible to bring into view more and more low-frequency phenomena (or low-frequency variations on relatively high-frequency phenomena). A case in point is the range of raising and control valence frames found in the ERG (Flickinger 2000; 2011). As of the 2018 release, the ERG includes over 60 types for raising and control predicates, including verbs, adjectives, and nouns, many of which are not otherwise discussed in the syntactic literature. These include such low-frequency types as the one for *incumbent*, which requires an expletive *it* subject, an obligatory *(up)on* PP complement, and an infinitival VP complement, and which establishes a control relation between the object of *on* and the VP's missing subject:<sup>34</sup>

(8) It is incumbent on you to speak plainly.

# **5.3 Analysis-order effects**

Grammar engineering means making analyses specific and then being able to build on them. This has both benefits and drawbacks: on the one hand, it means

<sup>34</sup>Our thanks to Dan Flickinger for this example.

#### Emily M. Bender & Guy Emerson

that additional grammar engineering work can build directly on the results of previous work. It also means that any additional grammar engineering work is constrained by the work it is building on. Fokkens (2014) observes this phenomenon and notes that it introduces artifacts: the form an implemented grammar takes is partially the result of the order in which the grammar engineer considered phenomena to implement. This is probably also true for non-computational work, as theoretical ideas developed with particular phenomena (and, indeed, languages) in mind influence the questions with which researchers approach additional phenomena. Fokkens proposes that the methodology of *meta-grammar* engineering can be used to address this problem: using her CLIMB methodology, rather than deciding between analyses of a given phenomenon without input from later-studied phenomena, the grammar engineer can maintain multiple competing analyses through time and break free, at least partially, of the effects of the timeline of grammar development. The central idea is that the grammar writer develops a meta-grammar, like the Grammar Matrix customization system (see Section 4.1.2), but for a single language. This customization system maintains alternate analyses of particular phenomena which are invoked via grammar specifications so the different versions of the grammar can be compiled and tested.

# **6 Summary**

In this chapter, we have attempted to illuminate the landscape of computational work in HPSG. We have discussed how HPSG as a theory supports computational work, described large-scale computational projects that use HPSG, highlighted some applications of implemented grammars in HPSG, and explored ways in which computational work can inform linguistic research. This field is very active and our overview is necessarily incomplete. Nonetheless, it is our hope that the pointers and overview provided in this chapter will serve to help interested readers connect with ongoing research in computational linguistics using HPSG.

# **Acknowledgments**

We would like to thank Stephan Oepen for helpful comments on an early draft of this chapter, Stefan Müller for detailed comments as volume editor and Elizabeth Pankratz for careful copy editing.

25 Computational linguistics and grammar engineering

# **References**


#### Emily M. Bender & Guy Emerson

*tics for less-studied languages* (Texas Linguistics Society 10), 16–36. Stanford CA: CSLI Publications ONLINE.


#### 25 Computational linguistics and grammar engineering

*the 12th International Conference on Computational Linguistics (COLING)*, 54– 58. Budapest: International Committee on Computational Linguistics (ICCL). http://www.aclweb.org/anthology/C88-1012 (17 March, 2021).


#### Emily M. Bender & Guy Emerson

*cies (UDW 2017)*, 19–26. Gothenburg, Sweden: Association for Computational Linguistics. https://aclweb.org/anthology/W17-0403 (10 February, 2021).


#### 25 Computational linguistics and grammar engineering


#### Emily M. Bender & Guy Emerson

*tional Conference on Computational Linguistics*, 282–288. Geneva, Switzerland: COLING. http://aclweb.org/anthology/C04-1041 (10 February, 2021).


#### 25 Computational linguistics and grammar engineering

ciation for Computational Linguistics. http://aclweb.org/anthology/P11-2034 (10 February, 2021).


#### Emily M. Bender & Guy Emerson

in natural language text. In Richárd Farkas, Veronika Vincze, György Szarvas, György Móra & János Csirik (eds.), *Proceedings of the 14th conference on computational natural language learning (CoNNL): Shared task*, 1–12. Uppsala, Sweden: Association for Computational Linguistics. http://aclweb.org/anthology/ W10-3001 (3 February, 2021).


#### 25 Computational linguistics and grammar engineering

Malta: European Language Resources Association (ELRA). http://www.lrecconf.org/proceedings/lrec2010/pdf/432\_Paper.pdf (10 February, 2021).



#### 25 Computational linguistics and grammar engineering


#### Emily M. Bender & Guy Emerson


#### 25 Computational linguistics and grammar engineering

*Informatics and Decision Making* 12(Supplement 1). Proceedings of the ACM 5th International Workshop on Data and Text Mining in Biomedical Informatics (DTMBio 2011), S4. DOI: 10.1186/1472-6947-12-S1-S4.


#### Emily M. Bender & Guy Emerson

Computational Linguistics. http://aclweb.org/anthology/P08-1006 (2 February, 2021).


#### 25 Computational linguistics and grammar engineering

*national Conference on Maltese Linguistics (Bremen/Germany, 18–20 October, 2007)* (Studies in Language Companion Series 113), 83–112. Amsterdam: John Benjamins Publishing Co. DOI: 10.1075/slcs.113.10mul.


#### Emily M. Bender & Guy Emerson


#### 25 Computational linguistics and grammar engineering

*shop on Semantic Evaluation (SemEval 2015)*, 915–926. Denver, CO: Association for Computational Linguistics. DOI: 10.18653/v1/S15-2153.


#### Emily M. Bender & Guy Emerson

guage Resources Association (ELRA). http : / / lrec - conf . org / proceedings / lrec2000/pdf/110.pdf (24 March, 2021).


#### 25 Computational linguistics and grammar engineering



#### 25 Computational linguistics and grammar engineering


#### Emily M. Bender & Guy Emerson


25 Computational linguistics and grammar engineering

Zwicky, Arnold M., Joyce Friedman, Barbara C. Hall & Donald E. Walker. 1965. The MITRE syntactic analysis procedure for Transformational Grammars. In Robert W. Rector (ed.), *AFIPS conference proceedings: 1965 – fall joint computer conference*, vol. 27, 317–326. Washington, D.C.: Spartan Books. DOI: 10.1145/ 1463891.1463928.

# **Chapter 26**

# **Grammar in dialogue**

# Andy Lücking

Université de Paris, Goethe-Universität Frankfurt

# Jonathan Ginzburg

Université de Paris

# Robin Cooper

Göteborgs Universitet

This chapter portrays some phenomena, technical developments and discussions that are pertinent to analysing natural language use in face-to-face interaction from the perspective of HPSG and closely related frameworks. The use of the CON-TEXT attribute in order to cover basic pragmatic meaning aspects is sketched. With regard to the notion of common ground, it is argued how to complement CONTEXT by a dynamic update semantics. Furthermore, this chapter discusses challenges posed by dialogue data such as clarification requests to constrained-based, modeltheoretic grammars. Responses to these challenges in terms of a type-theoretical underpinning (TTR, a Type Theory with Records) of both the semantic theory and the grammar formalism are reviewed. Finally, the dialogue theory *KoS* that emerged in this way from work in HPSG is sketched.

# **1 Introduction**

The archaeologists Ann Wesley and Ray Jones are working in an excavation hole, and Ray Jones is looking at the excavation map. Suddenly, Ray discovers a feature that catches his attention. He turns to his colleague Ann and initiates the following exchange (the example is slightly modified from Goodwin (2003: 222); underlined text is used to indicate overlap, italic comments in double round brackets

Andy Lücking, Jonathan Ginzburg & Robin Cooper. 2021. Grammar in dialogue. In Stefan Müller, Anne Abeillé, Robert D. Borsley & Jean- Pierre Koenig (eds.), *Head-Driven Phrase Structure Grammar: The handbook*, 1155– 1199. Berlin: Language Science Press. DOI: 10.5281/zenodo.5599870

#### Andy Lücking, Jonathan Ginzburg & Robin Cooper

are used to describe non-verbal actions, numbers in brackets quantify the duration of pauses, double colons indicate prolongation, bold face represents stress, superscript circles indicate in/exhalation):


Contrast the archaeological dialogue from (1) with a third person perspective text on a related topic. In a recent archaeology paper, the excavation of gallery grave Falköping stad 5 is described, among others (Blank et al. 2018: 4):

During excavation the grave was divided in different sections and layers and the finds were documented in these units. The bone material lacking stratographic and spatial information derives from the top layer […]. Both the antechamber and the chamber contained artefacts as well as human and animal skeletal remains, although most of the material was found in the chamber.

The differences between the archaeological dialogue and the paper are obvious and concern roughly the levels of *medium* (spoken vs. written), *situatedness* (degree of context dependence), *processing speed* (online vs. offline) and *standardization* (compliance with standard language norms) (Klein 1985). Attributing differences between dialogue and text simply to the medium (i.e. spoken vs. written) is tempting but insufficient. The corresponding characterizing features seem to form a continuum, as discussed under the terms *conceptual orality* and *conceptual literacy* in the (mainly German-speaking) literature for some time (Koch & Oesterreicher 1985). For example, much chat communication, although realized

#### 26 Grammar in dialogue

by written inscriptions, exhibits many traits of (conceptually) spoken communication, as investigated, for instance, by means of chat corpora (Beißwenger et al. 2012). Face-to-face dialogue stands out due to a high degree of context dependence manifested in shared attention (Tomasello 1998; see also turns 2 and 12 between Ann and Ray), non-verbal actions such as hand and arm gestures (Kendon 2004; McNeill 2000; turn 10; cf. Lücking 2021, Chapter 27 of this volume for a brief overview of non-verbal communication means), disfluencies (Ginzburg et al. 2014; turns 5 to 8), non-sentential utterances (Fernández & Ginzburg 2002; Fernández et al. 2007; turns 1, 4, and 5), laughter (Ginzburg et al. 2015; turn 9), shared knowledge of interlocutors (Clark et al. 1983; turns 10–12), turntaking (Sacks et al. 1974; Heldner & Edlund 2010; Levinson & Torreira 2015; e.g. question-answering in turns 1 and 4) and indirect reference (turn 10, where Ray points to an item on the map but refers to an archaeological artefact in the excavation hole). Note that such instances of deferred reference (Nunberg 1993) in situated communication actually differ from bridging anaphora (Clark 1975) in written texts, although they seem to be closely related at first glance. Bridging is a kind of indirect reference, too, where a definite noun phrase refers back to an antecedent entity which is not given in a strict sense, like *the goalkeeper* in *I watched the football match yesterday. The goalkeeper did an amazing save in overtime*. However, bridging NPs does not give rise to an index or demonstratum, which is the "deferring base" in case of indirect deixis (cf. Lücking 2018).

Since these phenomena are usually abstracted away from the linguistic knowledge encoded by a grammar, linguistics is said to exhibit a "written language bias" (Linell 2005). In fact, many of the phenomena exemplified above provide serious challenges to current linguistic theory, as has been argued by Ginzburg (2012), Ginzburg & Poesio (2016) and Kempson et al. (2016). So the question is: how serious is this bias? Is there a single language system with two modes, written and spoken (but obeying the qualifications we made above with respect to conceptual orality and literacy)? Or do written and spoken communication even realize different language systems? Responses can be given from different standpoints. When the competence/performance distinction was proposed (Chomsky 1965), one could claim that linguistic knowledge is more purely realized by the high degree of standardization manifested in written text, while speech is more likely to be affected by features attributed to performance (e.g. processing issues such as short term memory limitations or impaired production/perception). Once one attaches more importance to dialogical phenomena, one can also claim that there is a single, basic language system underlying written and spoken communication which bifurcates only in some cases, with interactivity and deixis

#### Andy Lücking, Jonathan Ginzburg & Robin Cooper

being salient examples (such a position is delineated but not embraced by Klein (1985); in fact, Klein remains neutral on this issue). Some even claim that "grammar is a system that characterizes talk in interaction" (Ginzburg & Poesio 2016: 1).<sup>1</sup> This position is strengthened by the primacy of spoken language in both ontogenetic and language acquisition areas (on acquisition see Borsley & Müller 2021: Section 5.2, Chapter 28 of this volume).

Advances in dialogue semantics are compatible with the latter two positions, but their ramifications are inconsistent with the traditional competence/performance distinction (Ginzburg & Poesio 2016; Kempson et al. 2016). Beyond investigating phenomena which are especially related to people engaging in faceto-face interaction, dialogue semantics contributes to the theoretical (re)consideration of the linguistic competence that grammars encode. Some of the challenges posed by dialogue for the notion of linguistic knowledge – exemplified by non-sentential utterances such as clarification questions and reprise fragments (Fernández & Ginzburg 2002; Fernández et al. 2007) – are also main actors in arguing *against* doing semantics within a unification-based framework (like Pollard & Sag 1987) and have implications for doing semantics in constraint-based frameworks (like Pollard & Sag 1994; see Section 3.1 below). In light of this, the relevant arguments are briefly reviewed below. As a consequence, we show how dialogue phenomena can be captured with a framework that leaves "classical" HPSG (i.e. HPSG as documented throughout this handbook). To this end, TTR (a Type Theory with Records) is introduced in Section 3.3. TTR is a strong competitor to other formalisms since it provides an account of semantics that covers dialogue phenomena from the outset. TTR also allows for "emulating" an HPSG kind of grammar, giving rise to a unified home for sign-based SYNSEM interfaces bridging to dialogue gameboards (covered in Section 4). To begin with, however, we give a brief historical review of pragmatics within HPSG.

# **2 From CONTEXT to update semantics for dialogue**

HPSG's interface to pragmatics is the CONTEXT attribute. The CONTEXT attribute accommodates contextual constraints that have to be fulfilled in order for an

<sup>1</sup>The sign structure used in HPSG is partly motivated by the bilateral notion of sign of de Saussure. In this respect it is interesting to note that also de Saussure advocated the primacy of spoken language:

Language and writing are two distinct systems of signs; the second exists for the sole purpose of representing the first. The linguistic object is not both the written and the spoken forms of words; the spoken forms alone constitute the object. (de Saussure 2011: 23–24)

In this respect, de Saussure acts as an early exponent *against* any written language bias.

#### 26 Grammar in dialogue

expression to be used appropriately or felicitously (Austin 1962), to use a term from speech act theory (Pollard & Sag 1994: 27). The CONTEXT attribute has been used and extended to model the content of indexical and pronominal expressions (see Section 2.1), information packaging (Section 2.2) and shared background assumptions concerning standard meanings (Section 2.3). A further step from such pragmatic phenomena to dialogue semantics is achieved by making signs encode their dialogue context, leading to an architectural revision in terms of *update semantics* (see Section 2.4).

# **2.1 C-INDS and BACKGROUND**

The CONTEXT attribute introduces two sub-attributes, CONTEXTUAL-INDICES (C-INDS) and BACKGROUND. The C-INDS attribute values provide pointers to circumstantial features of the utterance situation such as speaker, addressee and time and location of speaking. Within the BACKGROUND attribute, assumptions such as presuppositions or conventional implicatures are expressed in terms of *psoas*, *parameterized state of affairs* (see Section 3.2 for some alternative semantic representation formats). For instance, it is part of the background information of the pronoun *she* of the "natural gender language" English that its referent is female (this does not hold for "grammatical gender languages" like French or German). In the HPSG format of Pollard & Sag (1994: 20), this constraint is expressed as in (2), where *nom* stands for *nominative*:

The CONTENT value is of type *ppro* (*personal-pronoun*), which is related to the NP type (+*pronominal*, −*anaphor*) from *Government and Binding* theory (Chomsky

#### Andy Lücking, Jonathan Ginzburg & Robin Cooper

1982: 78) and interacts with HPSG's Binding Theory (see Müller 2021a, Chapter 20 of this volume; see also Wechsler 2021: Section 4.1, Chapter 6 of this volume). The CONTENT/CONTEXT description in (2) claims that whatever the referent of the pronoun is, it has to be female.

The contextual indices that figure as values for the C-INDS attribute provide semantic values for indexical expressions. For instance, the referential meaning of the singular first person pronoun *I* is obtained by identifying the semantic index with the contextual index "speaker".<sup>2</sup> This use of CONTEXT is illustrated in (3), which is part of the lexical entry of *I*.

$$\begin{array}{|c|c|c|}
\hline
\text{word} & \\
\text{^{\tiny \tiny \tiny \text{PHON}}} \{I\} \\
\\ \text{^{\tiny \tiny \tiny \text{SYNEM}}} \text{LOCAL} & \begin{bmatrix} \text{ppro} \\ \text{^{\tiny \tiny \text{UNENEM}}} \begin{bmatrix} \text{ppro} \\ \text{^{\tiny \tiny \text{INEM}}} \boxed{\text{^{\tiny \tiny \text{REF}}} \begin{bmatrix} \text{REF} & \text{1st} \\ \text{^{\tiny \tiny \text{INEM}}} \text{^{\tiny \text{S}}} \end{bmatrix} \\
\\ \text{^{\tiny \tiny \text{CNOTET}}} \begin{bmatrix} \text{context} \\ \text{^{\tiny \text{C} \text{-INDS}}} \end{bmatrix} \end{array} \end{array}$$

Inasmuch as the contextual anchors (see Barwise & Perry 1983: 72–73 or Devlin 1991: 52–63 on anchors in Situation Semantics) indicated by a boxed notation from (3) provide a semantic value for the speaker in a directly referential manner (see Marcus 1961 and Kripke 1980 on the notion of direct reference with regard to proper names), they also provide semantic values for the addressee (figuring in the content of *you*) as well as the time (*now*) and the place (*here*) of speaking.<sup>3</sup> Hence, the CONTEXT attribute accounts for the standard indexical expressions and provides a present tense marker needed for a semantics of tenses along the

<sup>2</sup>There are also indirect uses of *I*, where identification with the circumstantial speaker role would lead to wrong results. An example is the following:

Here it is the truck, not the speaker, or rather the author of the note, that is for rent. Hence, the notion of *speaker* has to be extended to what counts as speaker in a given situation (Kratzer 1978: 26).

<sup>3</sup>Of these, in fact, only the speaker is straightforwardly given by the context; all others can potentially involve complex inference.

#### 26 Grammar in dialogue

lines of *Discourse Representation Theory* (Kamp & Reyle 1993; see Partee 1973 on the preeminent role of an indexical time point). We will not discuss this issue further here (see Van Eynde 1998; 2000, Bonami 2002 and Costa & Branco 2012 for HPSG work on tense and aspect), but move on to briefly recapture other phenomena usually ascribed to pragmatics (see also Kathol et al. 2011: Section 5.2).

# **2.2 Information structure**

Focus, expressed by sentence accent in English, can be used for information packaging that may lead to truth-conditional differences even when the surface structures (i.e. strings; see Section 1 on a brief juxtaposition of spoken and written language) are the same (Halliday 1967). An example is given in (4), taken from Krifka (2008: 246), where capitalization indicates main accent and subscript "F" labels the focused constituent (see also Wasow 2021, Chapter 24 of this volume on incremental processing also with respect to aspects of information structure):

	- b. John only showed [MARY]<sup>F</sup> the pictures.

An analysis of examples like (4) draws on an interplay of phonology, semantics, pragmatics and constituency and hence emphasizes in particular the advantages of the *fractal* architecture of HPSG (Johnson & Lappin 1999). HPSG has the fractal property since information about phonetic, syntactic and semantic aspects is present in every sign, from words to phrases and clauses (Pollard & Sag 1994: 3) – see also Kubota (2021), Chapter 29 of this volume, Borsley & Müller (2021), Chapter 28 of this volume, Müller (2021b), Chapter 32 of this volume, Wechsler & Asudeh (2021), Chapter 30 of this volume and Hudson (2021), Chapter 31 of this volume for a comparison of HPSG to other grammar theories; a benchmark source is Müller (2016).

At the core of information structure is a distinction between *given* and *new* information. Accordingly, information structure is often explicated in terms of dynamic semantics (ranging from *File Change Semantics* by Heim 2002 and *Discourse Representation Theory* by Kamp & Reyle 1993 to information state update semantics proper by Traum & Larsson 2003) – see for instance Krifka (2008) or Vallduví (2016) for a discussion and distinction of various notions bound up with information structure such as *focus*, *topic*, *ground* and *comment* seen from the perspective of dialogue content and dialogue management. The most influential approach to information structure within HPSG is that of Engdahl & Vallduví (1996). Here a distinction between *focus*, that is, new information, and *ground*,

#### Andy Lücking, Jonathan Ginzburg & Robin Cooper

the given information, is made (Engdahl & Vallduví 1996: 3). The *ground* is further bifurcated into LINK and TAIL, which connect to the preceding discourse in different ways (basically, the link corresponds to a discourse referent or file, and the tail corresponds to a predication which is already subsumed by the interlocutors' information states). The information packaging of the content values of a sentence is driven by phonetic information in terms of A-accent and B-accent (Jackendoff 1972: Chapter 6), where "A-stressed" constituents are coindexed with FOCUS elements and "B-stressed" are coindexed with LINK elements – see also De Kuthy (2021), Chapter 23 of this volume. The CONTEXT extension for information structure on this account is given in (5):

Part of the analysis of the sample sentences from (4) is that in (4a), the CON-TENT value of the indirect object NP *the pictures* is the focused constituent, while it is the CONTENT value of the direct object NP *Mary* in (4b). The FOCUS-LINK-TAIL approach works via structure sharing: the values of FOCUS, LINK and TAIL get instantiated by whatever means the language under consideration uses in order to tie up information packages (whether syntactic, phonological or something else besides). If prosodic information is utilized for signalling information structure, a grammar has to account for the fact that prosodic constituency is not isomorphic to syntactic constituency, that is, prosodic structures cannot be built up in parallel to syntactic trees. Within HPSG, the approach to *prosodic constituency* of Klein (2000) employs *metrical trees* independent from syntactic trees, but grammatic composition remains syntax-driven. The latter assumption is given up in the work of Haji-Abdolhosseini (2003). Starting from Klein's work, an architecture is developed that generalizes over prosody-syntax mismatches: on this account, syntax, phonology and information structure are parallel features of a common list of domain objects (usually the inflected word forms). Information structure realized by prosodic stress is also part of the speech-gesture interfaces within multimodal extensions of HPSG (cf. Lücking 2021: Section 3.5, Chapter 27 of this volume).

26 Grammar in dialogue

# **2.3 Mutual beliefs**

A strictly pragmatic view on meaning and reference is presented by Green (1996). Green provides a CONTEXT extension for the view that restrictions on the index actually are background assumptions concerning standard uses of referential expressions. One of the underlying observations is that people can, for example, use the word *dog* to refer to, say, toy dogs or even, given appropriate context information, to a remote control (we will come back to this example shortly). The fact that the word *dog* can be used without further ado successfully to refer to instances of the subspecies *Canis lupus familiaris*<sup>4</sup> is due to shared assumptions about the standard meaning of *dog*. Green represents this account in terms of mutual beliefs between EXPERIENCER and STANDARD as part of the background condition of the CONTEXT of referential NPs. Drawing on work by Cohen & Levesque (1990), *mutually-believe* is a recursive relation such that the experiencer believes a proposition, believes that the standard believes the proposition too, believes that the standard believes that the experiencer believes the proposition, and so on. When a proposition is mutually believed within a speech community, it is *normally believed*. The semantic part of the lexical structure of *dog* is given in (6). The analysis of proper names is pursued in a similar manner, amounting to the requirement that for a successful use of a proper name, the interlocutors have to know that the intended referent of this name actually bears the name in question.

<sup>4</sup>Green (1996: Example (73)) actually restricts the standard use of *dog* to the family *Canis* (regiven in our example (6)), which seems to be too permissive. The *Canis* family also include foxes, coyotes and wolves, which are, outside of biological contexts, usually not described as being dogs. This indicates that the EXPERIENCER group should be further restricted and allowed to vary over different language communities and genres.

#### Andy Lücking, Jonathan Ginzburg & Robin Cooper

Adding beliefs to CONTEXT provides the representational means to integrate (at least some kinds of) presuppositions, illocutionary force and deferred reference (Nunberg 1978) into grammar. However, a fuller model of speech acts and meaning transfers is still needed (Kathol et al. 2011: 94).

Taking a closer look at the argument underlying adding mutual beliefs to CON-TEXT, one notices a striking similarity of shared assumptions about standard uses with *community membership* as a source for common ground (but see Footnote 4 for a hint on a possible refinement). However, community membership is just one of three sources of information on which the common ground between two interlocutors (scaling up to multilogue is obvious) can be based, according to Clark & Marshall (1981) and Clark et al. (1983):

The first is *perceptual evidence*, what the two have jointly experienced or are jointly experiencing at the moment. The second is *linguistic evidence*, what the two have jointly heard said or are now jointly hearing as participants in the same conversation. The third is *community membership*. They take as common ground everything they believe is universally, or almost universally, known, believed, or supposed in the many communities and subcommunities to which they mutually believe they both belong. (Clark et al. 1983: 247)

Reconsidering the "*dog*-used-to-refer-to-remote-control" example mentioned above: in order for this kind of reference to happen, one can imagine a preparatory sequence like the following:

(7) Can you please give me the … what's the name? … the … ah, let's call it "dog" … can you please give me the dog?

In this monologue, the speaker establishes a name for the remote control. After this name-giving, the situationally re-coined term can be used referentially (see Lücking et al. 2006 on situated conventions). Obviously, the felicity of reference is due to *linguistic evidence* provided and agreed upon in dialogical exchange. Dialogue contexts (Lee-Goldman 2011) and the dynamics of common ground is a dimension which is absent in the static CONTEXT representations surveyed above. This is where dynamic update semantics enters the stage.

# **2.4 Towards an update semantics for dialogue**

Starting from Stalnakerian contexts (Stalnaker 1978; see also Lewis 1979), that is, contexts which consist of mutually known propositions (also corresponding roughly to the mutual belief structures employed by Green 1996, cf. Section 2.3),

#### 26 Grammar in dialogue

Ginzburg argues in a series of works that this context actually has a more elaborate structure (Ginzburg 1994; 1996; 1997). One motivation for this refinement is found in data like (8), an example given by Ginzburg (1994: 2) from the London-Lund corpus (Svartvik 1990).

	- 2. B: Which university?
	- 3. A: Cambridge.
	- 4. B: Cambridge, um.
	- 5. What did you read?
	- 6. A: History and English.
	- 7. B: History and English.

There is nothing remarkable about this dialogical exchange; it is a mundane piece of natural language interaction. However, given standard semantic assumptions and a *given-new* information structuring as sketched in Section 2.2, (8) poses two problems. The first problem is that one and the same word, namely *Cambridge*, plays a different role in different contexts as exemplified by turns 2 to 3 on the one hand and turns 3 to 4 on the other hand. The reason is that the first case instantiates a question-answering pair, where *Cambridge* provides the requested referent. The second case is an instance of *accept*: speaker B not only signals that she heard what A said (what is called *acknowledge*), but also that she updates her information state with a new piece of information (namely that A studied in Cambridge).

The second problem is that neither of B's turns 4 and 7 is redundant, although neither of them contribute new information (or *foci*) in the information-structural sense of Section 2.2: the turns just consist of a replication of A's answer. The reason for non-redundancy obviously is that in both cases the repetition manifests an *accept* move in the sense just explained.

In order to make grammatical sense out of such dialogue data – eventually in terms of linguistic competence – contextual background rooted in language is insufficient, as discussed. The additional context structure required to differentiate the desired interpretation of (8) from redundant and co-text-insensitive ones is informally summarized by Ginzburg (1994: 4) in the following way:


#### Andy Lücking, Jonathan Ginzburg & Robin Cooper

Intuitively, turn 2 from the question-answer pair in turns 2 and 3 from (8) directly introduces a *question under discussion* – a semi-formal analysis is postponed to Section 4, which introduces the required background notions of *dialogue gameboards* and *conversational rules* which regiment dialogue gameboard updating. Given that in this case the *latest move* is a question, turn 3 is interpreted as an answer relating to the most recent question under discussion. This answer, however, is not simply added to the dialogue partners' common knowledge, that is, the *facts*. Rather, the receiver of the answer first has to *accept* the response offered to him – this is the dialogue reading of "It takes two to make a truth". After acceptance, the answer can be *grounded* (see Clark 1996: Chapter 4 for a discussion of common ground), that is, *facts* is *updated* with the proposition bound up with the given answer, the resolved question under discussion is removed from the qUD list (*downdating*) – in a nutshell, this basic mechanism is also the motor of the dialogue progressing. This mechanism entails an additional qualification compared to a static mutual belief context: dialogue update does not abstract over the individual dialogue partners. A dialogue move does not present the same content to each of the dialogue partners, nor does the occurrence of a move lead automatically to an update of the common ground (or mutual beliefs). Dialogue semantics accounts for this fact by distinguishing *public* from *private* information. Public information consists of observable linguistic behavior and its conventional interpretations, collected under the notion of *dialogue gameboard* (DGB). The DGB can be traced back to the *commitment-stores* of Hamblin (1970) that keep track of the commitments made at each turn by each speaker.

Private information is private since it corresponds to interlocutors' mental states (MS). The final ingredient is that the (fourfold) dynamics between the interlocutors' dialogue game boards and mental states unfolds in time, turn by turn. In sum, a minimal participant-sensitive model of dialogue contributions is a tuple of DGB and MS series of the form hDGB × MSi + for each dialogue agent. Here the tuple represents a temporarily ordered sequence of objects of a given type (i.e. DGB and MS in case of dialogue agents' information state models) which is witnessed by a *string* of respective events which is at least of length 1, as required by the "Kleene +" (see Cooper & Ginzburg 2015: Section 2.7 on a type-theoretical variant of the string theory of events of Fernando 2011).

Guided by a few dialogue-specific semantic phenomena, we moved from various extensions to CONTEXT to minimal participant models and updating/downdating dynamics. In Sections 3 and 4, further progress which mainly consists of inverting the theory's strategic orientation is reviewed: instead of extending HPSG in order to cover pragmatics and dialogue semantics, it is argued that there are reasons to start with an interactive semantic framework and then embed an HPSG variant therein.

#### 26 Grammar in dialogue

In order to move on, a remaining issue has to be resolved: what happens if an addressee for some reason refuses to accept a contribution of the previous speaker? In this case, the addressee (now taking the speaker role) poses a *clarification request*. Clarification potential plays an important methodological role in the dialogue semantic business, as is exemplified in the following section.

# **3 Type-theoretical pragmatics and dialogue semantics**

A minimal primer for the rich type theory TTR is given in Section 3.3. But why should (dialogue) semantics make use of a type theory at all? In what follows, two sources of motivation are presented, one drawing on semantic data gained from the clarification potential of reprise fragments (Section 3.1), the other resulting from HPSG's struggle with connecting to semantic theories (Section 3.2).

# **3.1 Subsentential meanings: unification and constraint-satisfaction vs. reprise content**

In (9), B poses a clarification request in terms of a reprise fragment concerning the verb used by A (Ginzburg 2012: 115):

(9) 1. A: Did Bo finagle a raise? 2. B: Finagle?

The reprise fragment has at least two interpretations: it can query the phonetic component of the verb ("did I hear correctly that you said 'finagle'?"), or it can query the meaning of the verb ("what does 'finagle' mean?"). Both queried aspects are available as part of the PHON-SYNSEM structure of signs, emphasizing the significance of HPSG's fractal design (cf. the remark on fractality in Section 2.2). However, when B uses the reprise fragment to clarify the content of the expression reprised, then B queries *only* the meaning of the reprised fragment (Purver & Ginzburg 2004; Ginzburg & Purver 2012) – in our example (9), this is *finagle*. This can be seen when answers are given that target the head verb or the verb phrase (head verb plus direct object argument *a raise*):

	- a. Yeah, like wangle.
	- b. Yeah, he wangled a wage increase.

From the continuations in (10) only the first one provides an answer to B's clarification question in (9). The second continuation can also answer a clarification request, but this clarification request is *finagle a raise?* That is, "[a] nominal fragment reprise question queries exactly the standard semantic content of the

#### Andy Lücking, Jonathan Ginzburg & Robin Cooper

fragment being reprised", which is the strong version of the *Reprise Content Hypothesis* put forth by Purver & Ginzburg (2004: 288).<sup>5</sup> In case of the example given in (9), the content of the head verb is queried, and not the meaning of the verb phrase (verb plus direct object) or the sentence (verb plus direct object and subject), since they correspond to constructions that are larger than the reprised fragment. In other words, a reprise fragment allows us to access the meaning of any expression regardless of its syntactic degree of embedding. However, this is not what follows from unification-based semantics. Due to structure sharing, certain slots of a head are *identified* with semantic contributions of modifier or argument constructions (see Davis, Koenig & Wechsler 2021, Chapter 9 of this volume on linking and Abeillé & Borsley 2021: Section 6.1, Chapter 1 of this volume on head-adjunct phrases). In the case of *finagle a raise*, this means that once the content of the VP is composed, the patient role (or whatever semantic composition means are employed – see Koenig & Richter 2021, Chapter 22 of this volume for an overview) of the verb *finagle* is instantiated by the semantic index contributed by *a raise*. At this stage one cannot recover the V content from the VP content – unification appears to be too strong a mechanism to provide contents at all levels as required by reprise fragments.

However, as Richter (2004a: Chapter 2) argues, unification is only required in order to provide a formal foundation for the *language-as-partial-information* paradigm of Pollard & Sag (1987) and its spin-offs. The *language-as-collectionof-total-objects* paradigm underlying Pollard & Sag (1994) and its derivatives is not in need of employing unification. Rather, grammars following this paradigm are model-theoretic, constraint-based grammars, resting on *Relational Speciate Re-entrant Language* (RSRL) as formal foundation (Richter 2004a via precursors like King 1999). The formalism RSRL in its most recent implementation (Richter 2004b) has the advantage that the models it describes can be interpreted in different ways.<sup>6</sup> On the one hand, it is compatible with the idea that grammars accumulate constraints that describe classes of (well-formed) linguistic objects, which in turn classify models of linguistic tokens (King 1999). On the other hand, it is compatible with the view that grammars describe linguistic types, where types are construed as equivalence classes of utterance tokens (Pollard 1999). On these accounts, a related argument applies nonetheless: once the constraints are accumulated that describe total objects with the PHON string *finagle a raise*, the superset of total objects corresponding to just *finagle* is not available any more. The implications of clarification data for any kind of grammar, in particular for semantics, seem to be that some mechanism is needed that keeps track of the se-

<sup>5</sup>The weak version (Purver & Ginzburg 2004: 287) only claims that a nominal fragment reprise question queries a part of the standard semantic content of the fragment being reprised.

<sup>6</sup>Richter (2019 p.c.); see also Richter (2021: Section 6), Chapter 3 of this volume.

#### 26 Grammar in dialogue

mantic contribution of each constituent of complex linguistic objects such as the verb *finagle* within the verb phrase *finagle a raise*. We do not know of any such attempts within constraint-based grammars and of the possible formal intricacies that may be involved, however. In the following, therefore, the HPSGTTR/KoS framework that provides trackable constituents by means of labelled representations and a dialogue gameboard architecture is introduced. We should emphasize to the reader that at this point we leave the formal background of standard HPSG as documented in this book. We want to point this out since the subsequentlyused representations look deceptively similar to attribute-value matrices (the risk of confusion is known from the essentially identical representations employed within unification- and constraint-based HPSG variants). We see this as a consequence of the dynamics of theories when their empirical domain is extended; at best, it adds to the formal and conceptual controversies and developments that take place in HPSG anyway, as briefly sketched in the beginning of this paragraph. However, HPSGTTR aims at adopting most of HPSG's desirable features such as its fractal architecture, its sign-based set-up and its linking facility between different layers of grammatical description. To begin with, we want to further motivate the point of departure in terms of HPSG's semantic objects.

# **3.2 Semantic objects: data structures vs. types**

Aiming at a declarative characterization of natural languages, the model theoretic set-up of HPSG has to define models for its domain of linguistic objects (Levine & Meurers 2006: Section 3; see also Richter 2021, Chapter 3 of this volume). In particular with regard to the values of the CONTENT and CONTEXT attribute, the crucial question is "how types in the [feature] logic should correspond to the semantic types being represented" (Penn 2000: 70). In order to provide an answer to this crucial question, one has to clarify what a semantic type is. This question, however, is perhaps even more far-reaching and intricate than the initial one and following it further would lead us to undertake a considerable diversion and probably even turn away from the actual point of the initial question (but for a recent related discussion on the status of propositions see King et al. 2014). A pragmatic interpretation of the crucial question probably is this: how do the types in the feature logic correspond to the semantic types employed in semantic theories? There is a justification for this restatement from the actual semantic practice in HPSG (cf. Koenig & Richter 2021, Chapter 22 of this volume).

For the purpose of the present discussion, a semantic theory can be conceived as consisting of two components, *semantic representations* and an extensional *domain* or *universe* within which the semantic representations are interpreted

#### Andy Lücking, Jonathan Ginzburg & Robin Cooper

(Zimmermann 2011; Kempson 2011). That is, another reformulation of the question is how the HPSG model theory is related to a semantic model theory. Further concreteness can be obtained by realizing that both kinds of theories aim to talk about the same extensional domain. Given this, the question becomes: how do HPSG's semantic representations correspond to the semantic representation of the semantic theory of choice? A closely related point is made by Penn (2000: 63): "A model-theoretic denotation could be constructed so that nodes, for example, are interpreted in a very heterogeneous universe of entities in the world, functions on those entities, abstract properties that they may have such as number and gender and whatever else is necessary – the model theories that currently exist for typed feature structures permit that […]". Formulating things in this way has a further advantage: the question is independent from other and diverging basic model theoretic assumptions made in various versions of HPSG, namely whether the linguistic objects to model are types (Pollard & Sag 1994) or tokens (Pollard & Sag 1987) and whether they are total objects (Pollard & Sag 1994) or partial information (Carpenter 1992). However, such a semantic model-theoretic denotation of nodes is not available in many of the most influential versions of HPSG: the semantic structures of the HPSG version developed by Pollard & Sag (1994) rests on a situation-theoretic framework. However, the (parameterized) states of affairs used as semantic representations lack a direct model-theoretic interpretation; they have to be translated into situation-theoretic formulæ first (such a translation from typed feature structures to situation theory is developed by Ginzburg & Sag 2000: Section. 3.6). That is, the semantic structures do not encode semantic entities; rather they are data structures that represent descriptions which in turn correspond to semantic objects. This is also the conclusion drawn by Penn. The quotation given above continues: "[…] but at that point feature structures are not being used as a formal device to represent knowledge but as a formal device to represent data structures that encode formal devices to represent knowledge" (Penn 2000: 63; see also the discussion given by Ginzburg 2012: Section 5.2.2).

There are two options in order to unite typed feature structures and semantic representations. The first is to use logical forms instead of (P)SOAs and by this means connect directly to truth-conditional semantics. This option makes use of what Penn (see above) calls a *heterogeneous universe*, since syntactic attributes receive a different extensional interpretation than semantic attributes (now consisting of first or second order logic formulæ). The second option is to resort to a homogeneous universe and take PHON-SYNSEM structures as objects in the world, as is done in type-theoretical frameworks – signs nonetheless stand out

#### 26 Grammar in dialogue

from ordinary objects due to their CONT part, which makes them representational entities in the first place.

The first option, using logical forms instead of situation-semantic (P)SOAs, was initiated by Nerbonne (1992). The most fully worked out semantics for HPSG from this strand has been developed by Richter & Sailer, by providing a mechanism to use the higher-order Ty2 language for semantic descriptions (Richter & Sailer 1999). This approach has been worked out in terms of *Lexical Resource Semantics* (LRS) where logical forms are constructed in parallel with attributevalue matrices (Richter & Sailer 2004).

At this point we should insert a word on HPSG's most popular underspecification mechanism, namely *(Robust) Minimal Recursion Semantics* (Copestake, Flickinger, Pollard & Sag 2005; Copestake 2007). (R)MRS formulæ may have unfilled argument slots so that they can be assembled in various ways. However, resolving such underspecified representations is not part of the grammar formalism, so (R)MRS representations do not provide an autonomous semantic component for HPSG. Therefore, they do not address the representation problem under discussion as LRS does.

The second option, using the type-theoretical framework TTR, has been developed by Cooper (2008; 2014; 2021) and Ginzburg (2012). TTR, though looking similar to feature descriptions, directly provides semantic entities, namely types (Ginzburg 2012: Sec. 5.2.2). TTR also has a model-theoretic foundation (Cooper 2021), so it complies with the representation-domain format we drew upon above.

A dialogical view on grammar and meaning provides further insight into semantic topics such as quantified noun phrases. Relevant observations are reported by Purver & Ginzburg (2004) concerning the clarification potential of noun phrases. They discuss data like the following (bold face added):

(11) a. TERRY: Richard hit the ball on the car. NICK: **What ball**? [{ *What ball do you mean by 'the ball'?*] TERRY: James [last name]'s football. (BNC file KR2, sentences 862, 865–866 ) b. RICHARD: No I'll commute every day ANON 6: **Every day?** [{ *Is it every day you'll commute?*] [{ *Is it every day you'll commute?*] [{ *Which days do you mean by every day?*] RICHARD: as if, er Saturday and Sunday ANON 6: And all holidays? RICHARD: Yeah [pause]

#### Andy Lücking, Jonathan Ginzburg & Robin Cooper

As testified in (11), the accepted answers which are given to the clarification requests are in terms of an *individual* with regard to *the ball* (11a) and in terms of *sets* with regard to *every day* in (11b). The expressions put to a clarification request (*the ball* and *every day*, respectively) are analyzed as *generalized quantifiers* in semantics (Montague 1973). A generalized quantifier, however, denotes a *set of sets*, which is at odds with its clarification potential in dialogue. Accordingly, in a series of works, a theory of quantified noun phrases (QNPs) has been developed that draw on the notion of *witness sets* (Barwise & Cooper 1981: 191) and analyze QNPs in terms of the intuitively expected and clarificationally required denotations of types *individual* and *sets of individuals*, respectively (Purver & Ginzburg 2004; Ginzburg & Purver 2012; Ginzburg 2012; Cooper 2013; Lücking & Ginzburg 2018; Cooper 2021).

There are further distinguishing features between logical forms and type theoretical entities, however. Types are intensional entities, so they directly provide belief objects, as touched upon in Section 2.3, which are needed for intensional readings as figuring in attitude reports such as in the assertion that *Flat Earthers believe that the earth is flat* (see also Cooper 2005a and Cooper 2021 on attitude reports in TTR).

Furthermore, TTR is not susceptible to the *slingshot argument* (Barwise & Perry 1983: 24–26): explicating propositional content on a Fregean account (Frege 1892) – that is, denoting the true or the false – in terms of sets of possible worlds is too coarse-grained, since two sentences which are both true (or false) but have nonetheless different meanings cannot be distinguished. In this regard, TTR provides a *structured theory of meaning*, where types are not traded for their extensions. Accordingly, a brief introduction to TTR is given in Section 3.3 and the architecture of the dialogue theory *KoS* incorporating a type-theoretic HPSG variant is sketched in Section 4.

# **3.3 A brief primer on TTR**

TTR, which builds on ideas in the intuitionistic Type Theory of Martin-Löf (1984) and its application to natural language semantics (see Ranta 2015), provides semantic objects at both the token and the type level and structures to organize these objects, namely records and record types (see Cooper 2005b, Cooper 2005a, Cooper 2012, Cooper 2017, and Cooper & Ginzburg 2015 for expositions). Records consist of fields of pairs of labels and objects, and record types consist of fields of pairs of labels and types, which both can be nested (Cooper 2021). Take for instance the schematic record in (12):

26 Grammar in dialogue

$$\begin{aligned} \text{(12)} \quad \left[ I\_0 = \begin{bmatrix} I\_1 & = \mathbf{o}\_1 \\ I\_2 & = \mathbf{o}\_2 \end{bmatrix} \right] \\ \mathbf{l}\_3 = \mathbf{o}\_3 \\ \dots \end{aligned} $$

 Here, <sup>1</sup> , <sup>2</sup> and <sup>3</sup> are (real-world) objects, which are labelled by <sup>1</sup> , <sup>2</sup> and 3, respectively (<sup>1</sup> and <sup>2</sup> are additionally part of a sub-record labelled 0). Records can be *witnesses* for record types. For instance, the record from (12) is a witness for the record type in (13) only in the case that the objects from the record are of the type required by the record type (i.e. <sup>1</sup> : <sup>1</sup> , <sup>2</sup> : 2, <sup>3</sup> : 3), where objects and types are paired by same labelling.

$$\begin{array}{c} \text{(13)} \quad \left[l\_0: \begin{bmatrix} l\_1:T\_1\\ l\_2:T\_2 \end{bmatrix} \right] \\\ \left[l\_3:T\_3 \right] \\\ \left[\begin{array}{c} \cdot \ \cdot \end{array} \right] \end{array}$$

The colon notation indicates a basic notion in TTR: a *judgement*. A judgement of the form : means that object is of type , or, put differently, that is a witness for . Judgements are used to capture basic classifications like *Marc Chagall is an individual* ( : *Ind*), as well as propositional descriptions of situations like *The cat is on the mat* for the situation depicted in Figure 1, where Fritz the cat sits on mat m33. The record type for the example sentence (ignoring the semantic contribution of the definite article for the sake of exposition<sup>7</sup> ) will be (14):


Note that the types labelled "c1", "c2", and "c3" in (14) are *dependent types*, since the veridicality of judgements involving these types depends on the objects that are assigned to the basic types labelled "x" and "y". A *witness* for the record type in (14) will be a *record* that provides suitable objects for each field of the record type (and possibly more). Obviously, the situation depicted in Figure 1 (adapted from Lücking 2018: 270) is a witness for the type in (14). The participants of the depicted situation can be thought of as situations themselves which show Fritz to be a cat, m33 to be a mat and Fritz to be on m33. The scene in the figure then corresponds to the following record, which is of the type expressed by the record type from (14):

<sup>7</sup>This record type corresponds to *a cat is on a mat*.

Andy Lücking, Jonathan Ginzburg & Robin Cooper

> 

Figure 1: Fritz the cat sits on a mat.

$$\begin{array}{rcl} \textbf{x} &=& Fitz\\ \textbf{c1} &=& cat \text{ situation} \\ \textbf{y} &=& m33 \\ \textbf{c2} &=& mat \text{ situation} \\ \textbf{c3} &=&relation \text{ situation} \end{array}$$

 Using type constructors, various types can be build out of basic and complex (dependent) types, such as set types and list types. In order to provide two (slightly simplified) examples of type constructors that will be useful later on, we just mention *function types* and *singleton types* here.

	- a. If <sup>1</sup> and <sup>2</sup> are types, then (<sup>1</sup> → 2) is a type, namely the type of functions that map <sup>1</sup> to 2.
	- b. If a function is of type (<sup>1</sup> → 2) then 's domain is { | : 1} and its range is included in { | : 2}.

The characterization in (16) is that of a standard extensional notion of function. Given that TTR is an intensional semantic theory – that is, two types are different even if their extension is the same – other notions of function types could be developed.

	- a. If is a type and : (i.e. object is of type ), then is a type.
	- b. : (i.e. object is of type ) iff : and = .

That is, a singleton type is singleton since it is the type of specific object.

Since types are semantic objects in their own right (types are not defined by or reduced to their extensions), not only an object of type can be the value of a label, but also type itself. One way of expressing this is in terms of *manifest fields*. A type-manifest field is notated in the following way: = : 0 , specifying that is the type . Analogously, object-manifest fields can be expressed by restricting the value of a label to a certain object.

26 Grammar in dialogue

For more comprehensive and formal elaborations of TTR, see the references given at the beginning of this section, in particular Cooper (2021).

# **4 Putting things together: HPSGTTR and dialogue game boards**

Signs as construed within HPSG can be reconstructed as record types of a specific kind (Cooper 2008). For instance, (18) shows the record type (the judgement colon indicates that we now talk about TTR objects) for a general sign according to Pollard & Sag (1994) (where *PhonType*, *CategoryType* and *SemType* denote obvious types – see the Appendix for a minimal HPSG fragment defined in terms of TTR).

(18) PHON *: list(PhonType)* SYNSEM *:* LOCAL *:* CAT *: CategoryType* CONTENT *: SemObj* CONTEXT *: RecType* 

Signs are extended by an interface to circumstantial features of the utterance situation in terms of the DGB-PARAMS attribute, which corresponds to the C-INDS from Section 2.1. The attribute's name abbreviates *dialogue gameboard parameters*, since its values have to be instantiated (that is, witnessed) in the process of grounding. Thus, if the content of an NP is part of DGB-PARAMS, then gets a referential interpretation. However, NPs need not be used referentially; there are what Donnellan (1966) calls *attributive uses* as in *The thief* (*whoever he is*) *stole my credit card*. To this end, there is a "coercion" operation from DGB-PARAMS to Q-PARAMS (*quantificational parameters*) involving an abstraction from individuals to 's descriptive condition (Purver & Ginzburg 2004; see the Appendix for the respective operation).

These HPSGTTR signs figure as constituents within an architecture known as *dialogue gameboard*, giving rise to a grammar-dialogue interface within the dialogue theory *KoS* (Ginzburg 1994; 1996; 2003; 2012). A Dialogue Game Board (DGB) is an information-state based sheet for describing communicative interactions. The DGB from KoS tracks the interlocutors (*spkr* and *addr* fields), a record of the dialog history (*Moves*), dialogue moves that are in the process of grounding (*Pending*), the question(s) currently under discussion (*QUD*), the assumptions shared among the interlocutors (*Facts*) and the dialogue participant's view of the visual situation and attended entities (*VisualSit*). The TTR representation of a DGB following Ginzburg (2012) is given in (19), where *LocProp* is the type of a *locutionary proposition* (see (21) below) and *poset* abbreviates "partially ordered set".

Andy Lücking, Jonathan Ginzburg & Robin Cooper


TTR, like many HPSG variants (e.g. Pollard & Sag 1987 and Pollard & Sag 1994), employs a situation semantic domain (Cooper 2021). This involves propositions being modelled in terms of types of situations, not in terms of sets of possible worlds. Since TTR is a type theory, it offers at least two explications of proposition. On the one hand, propositions can be identified with types (Cooper 2005a). On the other hand, propositions can be developed in an explicit Austinian way (Austin 1950), where a proposition is individuated in terms of a situation and situation type (Ginzburg 2011: 845) – this is the truth-making (and Austin's original) interpretation of "It takes two to make a truth", since on Austin's conception a situation type can only be truth-evaluated against the situation it is about. We follow the latter option here. The type of propositions and the relation to a Situation Semantics conception of "true" (Barwise & Perry 1983) is given in (20):

$$\begin{aligned} \text{(20)} \quad & \text{a. } \textit{Prop} =\_{def} \begin{bmatrix} \text{sIT} & : \textit{Record} \\ \text{SIT-TYPE} : \textit{RecType} \end{bmatrix} \\ & \text{b. } \textit{A proposition } p = \begin{bmatrix} \textit{SIT} & = \textit{s} \\ \textit{SIT-TYPE} = T \end{bmatrix} \text{is true iff } \textit{s}: T. \end{aligned}$$

A special kind of proposition, namely *locutionary propositions* (*LocProp*) (Ginzburg 2012: 172), can be defined as follows:

$$\text{(21)}\quad LocProp =\_{def} \begin{bmatrix} \text{SIGN} & : Record \\ \text{SIGN-TYPE} : RecType \end{bmatrix}$$

Locutionary propositions are sign objects utilized to explicate clarification potential (see Section 3.1) and grounding.

Given the dialogue-awareness of signs just sketched, a content for interjections such as "EHHH HEHH" which constitutes turn 3 from the exchange between Ann and Ray in (1) at the beginning of this chapter can be given. Intuitively, Ann signals with these sounds that she heard Ray's question, which in turn is neither grounded nor clarified at this point of dialogue but is waiting for a response, what is called *pending*. This intuition can be made precise by means

#### 26 Grammar in dialogue

of the following lexical entry (which is closely related to the meaning of *mmh* given by Ginzburg 2012: 163):

(22) PHON *: ehh hehh* CAT *:* - HEAD=*interjection : syncat* DGB-PARAMS *:* SPKR *: Ind* ADDR *: Ind* PENDING *: LocProp* C2 *: address(*SPKR, ADDR, PENDING*)* CONT=*Understand* SPKR, ADDR, DGB-PARAMS.PENDING *: IllocProp* 

 Knowing how to use feedback signals such as the one in (22) can be claimed to be part of linguistic competence. It is difficult to imagine how to model this aspect of linguistic knowledge if not by means of *grammar in dialogue*.

Dialogue gameboard structures as defined in (19) as well as lexical entries for interjections such as (22) are still *static*. The mechanism that is responsible for the dynamics of dialogue and regiments the interactive evolution of DGBs is *conversational rules*. A conversational rule is a mapping between an input and an output information state, where the input DGB is constrained by a type labelled *preconditions* (PRE) and the output DGB is subject to EFFECTS. That is, a conversational rule can be notated in the following form, where *DGBType* is the type of dialogue gameboards defined in (19).

(23) PRE *: DGBType* EFFECTS *: DGBType*

Several basic conversational rules are defined in Ginzburg (2012: Chapter 4) and some of them, namely those needed to analyze example (8) discussed above, are re-given below (with "Fact update/QUD-downdate" being simplified, however). *IllocProp* abbreviates "Illocutionary Proposition", *IllocRel* "Illocutionary Relation", *poset* "Partially Ordered Set", *AbSemObj* "Abstract Semantic Object" and *QSPEC* "Question-under-Discussion-Specific". With regard to the partially ordered QUD set, we use "h*,* i" to denote the upper bound for subset. For details, we have to refer the reader to Ginzburg (2012); we believe the following list to convey at least a solid impression of how dialogue dynamics works in KoS, however.

• Free Speech:

$$\begin{bmatrix} \texttt{PRE} & : \left[ \texttt{Qun} = \langle \rangle : post(\texttt{Qun} \texttt{tion}) \right] \\\\ \texttt{EFFECTs} & : \texttt{TurnUenderspec} \wedge\_{\texttt{merge}} \begin{bmatrix} \texttt{R} : \texttt{ABSEMO3} \texttt{J} \\ \texttt{R} : \texttt{LucocRE} \\ \texttt{LATEstMove} = \mathsf{R} \left( \texttt{sprKn}, \texttt{ADD}, \texttt{R} \right) : \texttt{LLocPR} \end{bmatrix} \end{bmatrix}$$

• QSPEC:

$$\begin{bmatrix} \mathsf{PRE} & \ : \left[ \mathsf{Qun} = \{ \mathsf{Q}, \mathsf{Q} \} \ : \mathsf{post}(\mathsf{Q} \mathsf{z} \mathsf{s} \mathsf{t}) \right] \\\\ \mathsf{EFFecrrs} & \ : \mathsf{TurnU} \mathsf{x} \mathsf{pec} \ \wedge\_{\mathsf{merge}} \begin{bmatrix} \mathsf{n} : \mathsf{ABSEMo} \mathsf{y} \\ \mathsf{R} : \mathsf{LucocRE} \\ \mathsf{LATEST} \mathsf{MoveE} = \mathsf{R} \{ \mathsf{spKR}, \mathsf{ADD}, \mathsf{R} \} ; \mathsf{LLOPCRO} \\ \mathsf{c1} : \mathsf{QspecCircr} \mathsf{c7} \mathsf{c8} \, \mathsf{q} \end{bmatrix} \end{bmatrix}$$

 

 

 • Ask QUD-incrementation:

$$\begin{bmatrix} \texttt{pRE} & \begin{bmatrix} \texttt{Q} & \texttt{&Question} \\ \texttt{LATESTMove=Asx(spxn, AND, x)} & \texttt{:} \mathit{IlocProp} \end{bmatrix} \end{bmatrix}$$

$$\begin{bmatrix} \texttt{EFFECTS} \ \vdots \begin{bmatrix} \texttt{QUD} = \{\texttt{Q}, \texttt{pRE}.\texttt{QUD} \} \ \vdots \text{:} \mathit{post(Question)} \end{bmatrix} \end{bmatrix}$$

• Assert QUD-incrementation:

$$\begin{bmatrix} \texttt{pRE} & \cdot \begin{bmatrix} \texttt{p} & \ & \texttt{Propposition} \\ \texttt{LATESTMove} = \texttt{Assembly} \texttt{(spkn, ADDR, p)} & \ \mathit{'IlicProp} \end{bmatrix} \end{bmatrix}$$

$$\begin{bmatrix} \texttt{EFFECTS} \ \vdots \begin{bmatrix} \texttt{p'} \texttt{pRE, QED} \end{bmatrix} \ \vdots \begin{bmatrix} \texttt{p} \texttt{5pt}(\texttt{Question}) \end{bmatrix} \end{bmatrix}$$

• Accept:


• Fact update/QUD-downdate (simplified into two variants):

$$\begin{array}{c} \begin{bmatrix} \mathbb{Q} & : \text{Question} \\ \mathbf{p}\_{\text{PRE}} & : \text{Prop} \\ \text{PRE} & : \begin{bmatrix} \text{AATESTMover=AccCEPT(\text{SPRx}, \text{ADD}, \text{p})} \end{bmatrix} : \text{Iloc}\,\text{Prop} \\ \begin{bmatrix} \text{QED} \{ \text{Q,SUB} \} & : \text{post(\text{Q}uesation)} \\ \text{QBG} & : \text{Qspecific}(\text{p}, \text{q}) \end{bmatrix} \end{array} \\ \begin{bmatrix} \text{FACTS} : \begin{bmatrix} \text{FACTS=PRE.FACTS} \cup \begin{Bmatrix} \text{p} \end{Bmatrix} : \text{Set(Prop)} \\ \text{QUD} = \text{PRE.QID} \end{Bmatrix} \end{array} \end{array}$$

26 Grammar in dialogue

$$\begin{array}{c} \begin{bmatrix} \texttt{pRE} & \begin{bmatrix} \texttt{p} & \texttt{&Prop} \\ \texttt{LATESTMove} = \texttt{AccCEP(sp\_{\texttt{X}}, \texttt{ADD}, \texttt{p})} : \texttt{IllaProp} \end{bmatrix} \\\ & \begin{bmatrix} \texttt{p} \texttt{Up} = \texttt{\{p?,SUBQun} \} & \texttt{\{poset(Question)} \end{bmatrix} \end{bmatrix} \end{array} \end{bmatrix} \begin{bmatrix} \texttt{\"POP} \\\ \texttt{QUD} = \begin{bmatrix} \texttt{\"P?,SUBQun} \end{bmatrix} \end{bmatrix} \begin{bmatrix} \texttt{\"P3.0\" \rightarrow \texttt{\ $P3.0\" \rightarrow \$ P4.0\" \rightarrow \ $P5.0\" \rightarrow \$ P6.0\" \rightarrow \ $P7.0\" \rightarrow \$ P7.0\" \rightarrow \ $P8.0\" \rightarrow \$ P9.0\" \end{bmatrix} \end{array}$$

Having dialogue game boards and conversational rules at one's disposal, we can apply KoS' analytical tools to the dialogue example from (8) above. We make the following simplifying assumptions: if the th move is an assertion, we refer to the asserted proposition in terms of "p()". The corresponding question *whether p()* is notated "p?()". If the th move is a question, we refer to the question in terms of "q()". Additionally, we assume that subsentential utterances project to Austinian propositions by resolving elliptical expressions in context in terms of their missing semantic constituents which are available as the contents of the maximal elements in QUD (that is, they are addressable via the path qUD.FIRST.CONT; cf. Ginzburg 2012: 68).

(24) *DGB dynamics Utterance/ Conversational rules* init PARTICIPANTS *=* A,B MOVES *=* hi qUD *=* hi FACTS *= cg0* — 1. SPKR *= A* ADDR *= B* MOVES *=* ASSERT(A,B,P(1)) qUD *=* P?(1) FACTS *= cg0* "I've been at university."/ Free Speech + Assert QUDincrementation 2. SPKR *= B* ADDR *= A* MOVES *=* ASK(B,A,Q(2)), ASSERT(A,B,P(1)) qUD *=* Q(2) FACTS *= cg0* ∪ P(1) "Which university?"/ Accept + Ask QUDincrementation 3. SPKR *= A* ADDR *= B* MOVES *=* ASSERT(A,B,P(3)), ASK(B,A,Q(2)), ASSERT(A,B,P(1)) QBG *= About(p(3),q(2))* qUD *=* P?(3), Q(2) FACTS *= cg0* ∪ P(1) "Cambridge."/ QSPEC (via *About* relation) + Assert QUDincrementation 4. SPKR *= B* ADDR *= A* MOVES *=* ACCEPT(B,A,P(3)), ASSERT(A,B,P(3)), ASK(B,A,Q(2)), ASSERT(A,B,P(1)) qUD *=* hi FACTS *= cg0* ∪ P(3), P(1) "Cambridge, um."/ Accept + Fact update/QUDdowndate 5. SPKR *= B* ADDR *= A* MOVES *=* \*ASK(B,A,Q(5)), ACCEPT(B,A,P(3)), ASSERT(A,B,P(3)), ASK(B,A,Q(2)), ASSERT(A,B,P(1)) <sup>+</sup> qUD *=* Q(5) FACTS *= cg0* ∪ P(3), P(1) "what did you read?"/ Free Speech + Ask QUDincrementation 6. SPKR *= A* ADDR *= B* MOVES *=* \*ASSERT(A,B,P(6)), ASK(B,A,Q(5)), ACCEPT(B,A,P(3)), ASSERT(A,B,P(3)), ASK(B,A,Q(2)), ASSERT(A,B,P(1)) <sup>+</sup> QBG *=* ABOUT*(p(6),q(5))* qUD *=* P?(6), Q(5) FACTS *= cg0* ∪ P(3), P(1) "History and English."/ QSPEC (via *About* relation) + Assert QUDincrementation 7. SPKR *= B* ADDR *= A* MOVES *=* \*ACCEPT(B,A,P(6)), ASSERT(A,B,P(6)), ASK(B,A,Q(5)), ACCEPT(B,A,P(3)), ASSERT(A,B,P(3)), ASK(B,A,Q(2)), ASSERT(A,B,P(1)) + qUD *=* hi FACTS *= cg0* ∪ P(6), P(3), P(1) "History and English."/ Accept + Fact update/QUDdowndate

26 Grammar in dialogue

Note that the dialogical exchange leads to an increase of the common ground of the interlocutors A and B: after chatting, the common ground contains the propositions *that A has been at university* (p(1)), *that A has been at Cambridge University* (p(3)) and *that A read History and English* (p(6)).

On these grounds, a lexical entry for "hello" can be spelled out. "Hello" realizes a greeting move (which is its content) and must be used discourse-initially (the MOVES list and the qUD set have to be empty):

(25) PHON *: hello* CAT *:* - HEAD=*interjection : syncat* DGB-PARAMS *:* SPKR *: Ind* ADDR *: Ind* MOVES=hi *: list(IllocProp)* qUD= *: poset*(*Question*) CONT=GREET SPKR,ADDR *: ILLOCPROP* 

 Discourse-dynamically, "hello" puts a greeting move onto the MOVES list of the dialogue gameboard, thereby initiates an interaction and invites for a countergreeting (the requirement for countergreeting is exactly that a greeting move is the element of the otherwise empty list of dialogue moves) – giving rise to an *adjacency pair* as part of the local management system for dialogues investigated in conversational analysis (Schegloff & Sacks 1973).

The discourse particle "yes" can be used to answer a polar yes/no question. In this use, "yes" has a propositional content that asserts the propositional content of the polar question ?, which has to be the maximal element in qUD (Ginzburg 2012: Chapter 2, 231 *et seq.*). That is, "yes" affirmatively resolves a given polar question. Polar questions, in turn, are 0-ary propositional abstracts (Ginzburg 2012: 231), that is, the polar question ? corresponding to a proposition is a function mapping an empty record to : : []*.*. Thus, applying ? to an empty record [] returns , which is exactly what "yes" does. The affirmative particle (used to answer a yes/no question) is a propositional lexeme which applies a polar question which is maximal in qUD to an empty record (cf. Ginzburg 2012: 232):

(26) PHON *: yes* CAT *:* - HEAD=*partcl : syncat* DGB-PARAMS *:* qUD= MAX *: PolQuestion* REST *: set*(*Question*) *: poset*(*Question*) CONT=DGB-PARAMS.qUD.MAX([ ])*:* PROP

  Andy Lücking, Jonathan Ginzburg & Robin Cooper

Due to its involvedness in DGB-PARAMS.qUD, "yes" directly interacts with accept and downdating, as described above. For more on this, see Ginzburg (2012).

# **5 Outlook**

Given a basic framework for formulating and analyzing content in dialogue context, there are various directions to explore, including the following ones.


Finally, we want to mention two other dialogue-theoretic frameworks that have been worked out to a substantial degree, namely PTT (Traum 1994; Poesio 1995; Poesio & Traum 1997; Poesio & Rieser 2010), and *Segmented Discourse Representation Theory* (SDRT) (Asher 1993; Asher & Lascarides 2003; 2013; Hunter & Asher 2015). The phenomena and outlook directions discussed in this chapter apply to all theories of dialogue semantics, of course.

26 Grammar in dialogue

# **Appendix: An HPSGTTR fragment**

The appendix provides a fragment of HPSGTTR. The grammar framework used is oriented at a *Head-driven Phrase Structure Grammar* variant (Sag et al. 2003), namely its TTR implementation (Cooper 2008). We use HPSG because its architecture satisfies the property of *incremental correspondence* (Johnson & Lappin 1999) – utterance representations encode phonological, syntactic, semantic and contextual information *fractally*. This is crucial *inter alia* for any treatment of clarification interaction (cf. Section 3.1). We use HPSGTTR because the typetheoretical version allows us to directly incorporate semantic objects (cf. Section 3.2).

TTR has a counterpart to unification, namely the *merge* construction.

	- b. Since merge types are complicated to define (but see Cooper 2012), we follow the strategy of Cooper (2017) and illustrate the working of merges by means of some examples:

$$\begin{aligned} \text{(i)} \quad & \begin{bmatrix} \mathbf{A} : T\\ \mathbf{B} : R \end{bmatrix} \wedge\_{merge} \begin{bmatrix} \mathbf{c} : S\\ \mathbf{c} : R \end{bmatrix} = \begin{bmatrix} \mathbf{A} : T\\ \mathbf{B} : R\\ \mathbf{c} : S \end{bmatrix} \\ \text{(ii)} \quad & \begin{bmatrix} \mathbf{A} : T\\ \mathbf{b} : m \text{erge} \end{bmatrix} \mathbf{A} : R \end{aligned} \qquad \begin{bmatrix} \mathbf{A} : R\\ \mathbf{b} : S\\ \mathbf{c} : R \end{bmatrix} \begin{bmatrix} \mathbf{A} : T\\ \mathbf{b} : m \text{merge} \end{bmatrix} \mathbf{R}$$

Structure sharing is indicated by a "tag type" notation. Tag types are defined in terms of manifest fields.<sup>8</sup> The notational convention is exemplified in (28) by means of head-specifier agreement, where the tag type from (28a) abbreviates the structure in (28b):

$$\begin{array}{ll} \text{(28)} & \text{a. } \left[ \text{CAT}: \begin{bmatrix} \text{HEAD}: \left[ \text{CAR} \overleftarrow{\Box} : \text{Age} \right] \\ \text{SPR}: \left\{ \left[ \text{CAT}: \left[ \text{HEAD}: \left[ \text{AGR} = \boxed{\Box} : \text{Age} \right] \right] \right] \right\} \end{array} \right] \end{array}$$

$$\begin{array}{ll} \text{b. } \left[ \text{CAT}: \begin{bmatrix} \text{HEAD}: \left[ \text{AGR}: \text{Age} \right] \\ \text{SPR}: \left\{ \left[ \text{CAT}: \left[ \text{HEAD} = \text{CAT}. \text{HEAD}.\text{AGR}: \text{Age} \right] \right] \right\} \end{array} \right] \end{array}$$

The tag type notation alludes to the box notation common in HPSG work.

<sup>8</sup>*NB:* technically, tag types apply singleton types to record types, instead of to objects, thereby making use of a revision of the notion of singleton types introduced by Cooper (2013: 4, footnote 3).

Andy Lücking, Jonathan Ginzburg & Robin Cooper

*Agr* is defined as usual:

$$\text{(29)}\quad Agar := \begin{vmatrix} \text{NUM : Num} \\ \text{pERS : Per} \\ \text{GEN : Gen} \end{vmatrix}$$

A basic *sign* is a pairing of phonetic, syntactic and semantic information and follows the geometry in (30):

$$\text{(30)}\quad \text{sign}\, n := \begin{bmatrix} \text{PHON} & \colon \text{Phoneme} \\ \text{CAT} & \colon \text{SymCat} \\ \text{DGB-PARAMS} & \colon \text{RecType} \\ \text{CONT} & \colon \text{SemObj} \end{bmatrix}$$

Signs employ DGB-PARAMS, which host referential meanings that are witnessed among interlocutors. Quantificational abstraction is achieved by coercing parts of DGB-PARAMS to Q-PARAMS:

(31) If DGB-PARAMS : <sup>2</sup> and for two record types <sup>0</sup> and <sup>1</sup> lacking any mutual dependencies<sup>9</sup> <sup>2</sup> = <sup>0</sup> ∧ <sup>1</sup> , then <sup>0</sup> can be moved to Q-PARAMS, resulting in the following structure:

 DGB-PARAMS *:* <sup>1</sup> CONT *=* - Q-PARAMS *:* <sup>0</sup> 

A word is a sign with constituent type (CXTYPE) *word*. Using the merge operation, the word extension on signs can be represented compactly as in (32a), which expands to the structure given in (32b):

(32) a. *word* <sup>B</sup> *sign* <sup>∧</sup>*merge* - CXTYPE *: word* : *RecType* b. CXTYPE *: word* PHON *: Phoneme* CAT *: SynCat* DGB-PARAMS *: RecType* CONT *: SemObj* 

Words – that is, CXTYPE *word* – are usually the result of lexical rules, whose input are lexemes. Lexemes differ from words in their constituent type:

(33) *lexeme* <sup>B</sup> *sign* <sup>∧</sup>*merge* - CXTYPE *: lexeme* : *RecType*

Phrases can be headed or non-headed structures. A headed phrase is a phrase with a prominent daughter, i.e. the head daughter:

<sup>9</sup>None of the labels occurring in <sup>0</sup> occur in <sup>1</sup> and vice versa.

#### 26 Grammar in dialogue

 

### (34) a. *hd-phrase* <sup>B</sup> *phrase* <sup>∧</sup>*merge* -DTRS *:* -HD-DTR *: Sign* : *RecType*


 The head daughter is special since it (as a default, at least) determines the syntactic properties of the mother construction. This aspect of headedness is captured in terms of the *Head-Feature Principle* (HFP), which can be implemented by means of tag types as follows:

$$\begin{aligned} \text{(35)} \quad &\text{HFP} := \begin{bmatrix} \text{CAT} & : \left[ \text{HEAD} \underline{\boxplus} : PoS \right] \\ \text{HD-DTR} & : \left[ \text{CAT} : \left[ \text{HEAD} \underline{=} \boxplus : PoS \right] \right] \end{bmatrix} \end{aligned}$$

The fact that the daughters' locutions combine to the mother's PHON value is captured in terms of a "Phon Principle" (we use a slash notation in order to indicate paths starting at the outermost level of a feature structure):

$$\text{(36)}\quad \text{PHON}:= \begin{bmatrix} \text{cxtryPE}: \text{phras} \\ \text{PHON}: \text{List(/} \text{HD-DTR.PHON}, / \text{NHD-DTR.POS1.PHON}, \\ \dots, / \text{NHD-DTR.POS7.PHON} \end{bmatrix}$$

 Since semantic composition rests on predication rather than unification, there is no analog to the semantic compositionality principle of Sag et al. (2003) in our account. There is, however, something akin to semantic inheritance: we need to keep track of the contextual and quantificational parameters contributed by the daughters of a phrase. This is achieved in terms of the *DGB-Params Principle* (*DGBPP*) in (37) which unifies the daughters' DGB-PARAMS into the mother's DGB-PARAMS (see Ginzburg 2012: 126 *et seq.* for a similar principle):

(37) DGBPP B


 A headed phrase is well-formed when it is a headed phrase and it obeys the head feature principle, the Phon Principle and the DGB-Params Principle, which is expressed by extending *hd-phrase* by the following constraint:

  Andy Lücking, Jonathan Ginzburg & Robin Cooper

(38) *hd-phrase* B *hd-phrase* ∧*merge* HFP ∧*merge* PHON ∧*merge* DGBPP

Using this set-up, lexical entries, lexical rules and syntactic constructions can be formulated straightforwardly.

# **Acknowledgments**

The work on this chapter by Lücking is partially supported by a public grant overseen by the French National Research Agency (ANR) as part of the program "Investissements d'Avenir" (reference: ANR-10-LABX-0083). It contributes to the IdEx Université de Paris – ANR-18-IDEX-0001. Cooper's work was supported by the projects InCReD (Incremental Reasoning in Dialogue), VR project 2016-01162 and DRiPS (Dialogical Reasoning in Patients with Schizophrenia), Riksbankens jubileumsfond, P16-0805:1 and a grant from the Swedish Research Council (VR project 2014-39) for the establishment of the Centre for Linguistic Theory and Studies in Probability (CLASP) at the University of Gothenburg. The authors want to thank several anonymous reviewers for their comments. We thank in particular Bob Borsley, Stefan Müller and Frank Richter for detailed discussions and suggestions on earlier drafts of this chapter. Furthermore, we are grateful to Elizabeth Pankratz for her attentive remarks and for proofreading.

# **References**


#### 26 Grammar in dialogue


Andy Lücking, Jonathan Ginzburg & Robin Cooper


#### 26 Grammar in dialogue


Andy Lücking, Jonathan Ginzburg & Robin Cooper


#### 26 Grammar in dialogue


Andy Lücking, Jonathan Ginzburg & Robin Cooper

Linguistic Sciences 24), 215–232. Urbana, IL: Department of Linguistics, University of Illinois.


#### 26 Grammar in dialogue

*Theory* (Studies in Linguistics and Philosophy 42). Dordrecht: Kluwer Academic Publishers. DOI: 10.1007/978-94-017-1616-1.


Andy Lücking, Jonathan Ginzburg & Robin Cooper

ogy and Syntax), 1001–1042. Berlin: Language Science Press. DOI: 10 . 5281 / zenodo.5599862.


26 Grammar in dialogue

*book* (Empirically Oriented Theoretical Morphology and Syntax), 1201–1250. Berlin: Language Science Press. DOI: 10.5281/zenodo.5599872.


Andy Lücking, Jonathan Ginzburg & Robin Cooper

ogy and Syntax), 1497–1553. Berlin: Language Science Press. DOI: 10 . 5281 / zenodo.5599882.


#### 26 Grammar in dialogue


Andy Lücking, Jonathan Ginzburg & Robin Cooper

*meeting of the special interest group on discourse and dialogue (SIGDIAL 2016)*, 360–369. Los Angeles, CA: Association for Computational Linguistics. DOI: 10.18653/v1/W16-3645.


#### 26 Grammar in dialogue

(eds.), *Head-Driven Phrase Structure Grammar: The handbook* (Empirically Oriented Theoretical Morphology and Syntax), 1395–1446. Berlin: Language Science Press. DOI: 10.5281/zenodo.5599878.

Zimmermann, Thomas Ede. 2011. Model-theoretic semantics. In Claudia Maienborn, Klaus von Heusinger & Paul Portner (eds.), *Semantics: An international handbook of natural language meaning*, vol. 1 (Handbücher zur Sprach- und Kommunikationswissenschaft / Handbooks of Linguistics and Communication Science (HSK) 33), 762–802. Berlin: Mouton de Gruyter. DOI: 10 . 1515 / 9783110226614.946.

# **Chapter 27**

# **Gesture**

# Andy Lücking

Université de Paris, Goethe-Universität Frankfurt

The received view in (psycho)linguistics, dialogue theory and gesture studies is that co-verbal gestures, i.e. hand and arm movement, are part of the utterance and contribute to its content (Kendon 1980; McNeill 1992). The relationships between gesture and speech obey regularities that need to be defined in terms of not just the relative timing of gesture to speech, but also the linguistic form of that speech: for instance, prosody and syntactic constituency and headedness (Loehr 2007; Ebert et al. 2011; Alahverdzhieva et al. 2017). Consequently, speech–gesture integration is captured in grammar by means of a gesture-grammar interface. This chapter provides basic snapshots from gesture research, reviews constraints on speech– gesture integration and summarizes their implementations into HPSG frameworks. Pointers to future developments conclude the exposition. Since there are already a couple of overviews on gesture such as Özyürek (2012), Wagner et al. (2014) and Abner et al. (2015), this chapter aims at distinguishing itself by providing a guided tour of research that focuses on using (mostly) standard methods for semantic composition in constraint-based grammars like HPSG to model gesture meanings.

# **1 Why gestures?**

People talk with their whole body. A verbal utterance is couched in an intonation pattern that, via prosody, articulation speed or stress, functions as *paralinguistic* signals (e.g. Birdwhistell 1970). The temporal dimension of paralinguistics gives rise to *chronemic* codes (Poyatos 1975; Bruneau 1980). *Facial expressions* are commonly used to signal emotional states (Ekman & Friesen 1978), even without speech (Argyle 1975), and are correlated to different illocutions of the speech acts performed by a speaker (Domaneschi et al. 2017). Interlocutors use *gaze* as a

Andy Lücking. 2021. Gesture. In Stefan Müller, Anne Abeillé, Robert D. Borsley & Jean- Pierre Koenig (eds.), *Head-Driven Phrase Structure Grammar: The handbook*, 1201–1250. Berlin: Language Science Press. DOI: 10.5281/zenodo. 5599872

### Andy Lücking

mechanism to achieve joint attention (Argyle & Cook 1976) or provide social signals (Kendon 1967). Distance and relative direction of speakers and addressees are organized according to culture-specific radii into social spaces (*proxemics*, Hall 1968). Within the inner radius of private space, tactile codes of *tacesics* (Kauffman 1971) are at work. Since the verbal and nonverbal communication means of face to face interaction may occur simultaneously, *synchrony* (i.e. the mutual overlap or relative timing of verbal vs. non-verbal communicative actions) is a feature of the multimodal utterance itself; it contributes, for instance, to identifying the word(s) that are affiliated to a gesture (Wiltshire 2007). A special chronemic case is signalling at the right moment – or, for that matter, missing the right moment (an aspect of communication dubbed *kairemics* by Lücking & Pfeiffer 2012: 600). Besides the manifold areas of language use, the conventionalized, symbolic nature of language secures language's primacy in communication, however (de Ruiter 2004). For thorough introductions into semiotics and multimodal communication see Nöth (1990), Posner et al. (1997–2004) or Müller, Cienki, Fricke, Ladewig, McNeill & Tessendorf (2013); Müller, Cienki, Fricke, Ladewig, McNeill & Bressem (2013).

The most conspicuous non-verbal communication means of everyday interaction are hand and arm movements, known as *gestures* (in a more narrow sense which is also pursued from here on). In seminal works, McNeill (1985; 1992) and Kendon (1980; 2004) argue that co-verbal gestures, i.e. hand and arm movements, can be likened to words in the sense that they are part of a speaker's utterance and contribute to discourse. Accordingly, integrated speech–gesture production models have been devised (Kita & Özyürek 2003; de Ruiter 2000; Krauss et al. 2000) that treat utterance production as a multimodal process (see Section 4.4 for a brief discussion). Given gestures' imagistic and often spontaneous character, it is appealing to think of them as "postcards from the mind" (de Ruiter 2007: 21). Clearly, given this entrenchment in speaking, the fact that one can communicate meaning with non-verbal signals has repercussions to areas hitherto taken to be purely linguistic (in the sense of being related to the verbal domain). This section highlights some phenomena particularly important for grammar, including, for instance, *mixed syntax* (Slama-Cazacu 1976), or *pro-speech gesture*:

### (1) He is a bit [*circular movement of index finger in front of temple*].

In (1), a gesture replaces a position that is usually filled by a syntactic constituent. The gesture is emblematically related to the property of *being mad* so that the mixed utterance from (1) is equivalent to the proposition that the referent of *he* is a bit mad.

### 27 Gesture

Figure 1: *Die Skulptur die hat 'n* [*BETONsockel*] 'The sculpture has a concrete base' [V5, 0:39]

The gesture shown in Figure 1 depicts the shape of a concrete base, which the speaker introduces into discourse as an attribute of a sculpture:<sup>1</sup>

(2) Die the Skulptur sculpture die it hat has 'n a [BETONsockel]. concrete.base. 'The sculpture has a concrete base.'

The following representational conventions obtain: square brackets roughly indicate the portion of speech which overlaps temporally with the gesture (or more precisely, with the gesture stroke; see Figure 5 below) and upper case is used to mark main stress or accent. So both timing and intonation give clues that the gesture is related to the noun *Betonsockel* 'concrete base'. From the gesture, but not from speech, we get that the concrete base of the sculpture has the shape of a flat cylinder – thus, the gesture acts as a nominal modifier. There is a further complication, however: the gesture is incomplete with regard to its interpretation – it just depicts about half of a cylinder. Thus, gesture interpretation may involve processes known from gestalt theory (see Lücking 2016 on a *good continuation* constraint relevant to (2)/Figure 1).

The speaker of the datum in Figure 2 uses just a demonstrative adverb in order to describe the shape of a building he is talking about:

<sup>1</sup>The examples in Figures 1, 2, 3, 4, 11 and 12 are drawn from the (German) *Speech and Gesture Alignment* corpus (SaGA, Lücking et al. 2010) and are quoted according to the number of the dialogue they appear in and their starting time in the respective video file (e.g. "V9, 5:16" means that the datum can be found in the video file of dialogue V9 at minute 5:16). Examples/Figures 4 and 11 have been produced especially for this volume; all others have also been used in Lücking (2013) and/or Lücking (2016).

Andy Lücking

Figure 2: *Dann ist das Haus halt SO* [] 'The house is like this []' [V11, 2:32]

(3) Dann Then ist is das the Haus house halt just SO like.this []. []. 'The house is like this [].'

The demonstrative shifts the addressee's attention to the gesture, which accomplishes the full shape description, namely a cornered U-shape. In contrast to the example in Figure 1, the utterance associated with Figure 2 is not even interpretable without the gesture.

A lack of interpretability is shared by exophorically used demonstratives, which are *incomplete* without a demonstration act like a pointing gesture (Kaplan 1989: 490). For instance, Claudius would experience difficulties in understanding how serious Polonius is about his (Polonius') conjecture about the reason of Hamlet's (alleged) madness, if Polonius had not produced pointing gestures (Shakespeare, *Hamlet, Prince of Denmark* Act II, Scene 2; the third occurrence of *this* is anaphoric and refers back to Polonius' conjecture):

(4) POLONIUS (*points to his head and shoulders*): Take this from this if this be otherwise.

In order for Claudius to interpret Polonius' multimodal utterance properly, he has to associate correctly the two pointing gestures with the first two occurrences of *this* (cf. Kupffer 2014). Polonius facilitates such an interpretation by means of a temporal coupling of pointing gestures and their associated demonstratives – a relationship that is called *affiliation*. The role of synchrony in multimodal utterances is further illustrated by the following example, (5), and Figure 3 (taken from Lücking 2013: 189):

### 27 Gesture

(5) Ich I g[laube think das those sollen should TREP]pen staircases sein. be 'I think those should be staircases.'

The first syllable of the German noun *Treppen* (*staircases*) carries main stress, indicated by capitalisation. The square brackets indicate the temporal overlap between speech and gesture stroke, which is shown in Figure 3. The gesture attributes a property to the noun it attaches to: from the multimodal utterance, the observer retrieves the information that the speaker talks about spiral staircases. This interpretation assumes that the common noun is the affiliate of the gesture. Obviously, mere temporal synchrony is too weak to be an indicator of affiliation. In fact, there are speech–gesture affiliations without temporal overlap between gesture and verbal affiliate at all (e.g. Lücking et al. 2004). Therefore, temporal overlap or vicinity is just one indicator of affiliation. A second one is intonation: a gesture is usually related to a stressed element in speech (Loehr 2007: 209) (McClave 1994, however, found that beat gestures also co-occur with unstressed words, namely non-initial bests that are produced in a beat gesture series). As a result, multimodal communication gives rise to a complex "peak pattern" (Tuite 1993: 98, Loehr 2004: 111).

The interpretation of a gesture changes with different affiliations. Suppose the gesture from Figure 3 is produced in company to stressed *glaube* (*think*) instead of *staircases*:

### Andy Lücking

(6) Ich I G[LAUbe think das those sollen should Trep]pen staircases sein. be 'I think those should be staircases.'

Now the spiral movement is interpreted as a metaphorical depiction of a psychological process. Thus, the interpretation of a gesture depends on the integration point (affiliation), which in turn is marked by temporal vicinity, prosody and syntactic constituency of the candidate affiliate (Alahverdzhieva et al. 2017).

The crucial observations in any case are that gestures contribute to propositional content and take part in pragmatic processes. Interestingly, gestures share the latter aspect with laughter, which also has propositional content (Ginzburg et al. 2015), for instance, when referring to real world events. Thus, a multimodal utterance may express a richer content than speech alone, as in (5), or a content equivalent to speech, as in (6); it can even express less than speech or contradict speech:<sup>2</sup>

The nonverbal act can repeat, augment, illustrate, accent, or contradict the words; it can anticipate, coincide with, substitute for or follow the verbal behaviour; and it can be unrelated to the verbal behaviour. (Ekman & Friesen 1969: 53)

Contradictions or speech–gesture mismatches can occur when saying "right" but pointing left (as can be observed in everyday life but also been found in SaGA, e.g. in dialogue V24, at 4:50). A more complex case is given in (7) and Figure 4, where the speaker talks about a "rectangular arch" (which is of course a *contradictio in adiecto* in itself), but produces a roundish movement with the extended index finger of her right hand (the object she talks about is an archway). Note that the gesture just overlaps with "rectangular": its temporal extension in (7) is again indicated by means of square brackets within the original German utterances. The main stress is on the first syllable of the adjective and the noun receives secondary stress. The dots ("..") mark a short pause, so the gesture starts before "rechteckiger".

(7) so'n such.an so'ne such Art kind.of [.. RECHTecki]ger rectangular BOgen arch 'kind of rectangular arch'

An obvious interpretation of this mismatch is that "rectangular" is a slip of the tongue; interestingly, we found no "slip of the hand" in our data so far (which

<sup>2</sup> In case of contradiction or speech–gesture mismatch, the resulting multimodal utterance is perceived as ill-formed and induces N400 effects (Wu & Coulson 2005; Kelly et al. 2004).

### 27 Gesture

Figure 4: *so'n so'ne Art* [.. *RECHTecki*]*ger BOgen* 'kind of rectangular arch' [V4, 1:47].

may be a hint to a possibly imagistic origin of gestures, as assumed in some production models; cf. Section 4.4).

Moving from sentence to dialogue, *interactive gestures* are bound up with turn management, among other things (Bavelas et al. 1992; 1995). For instance, pointing gestures can be used to indicate the next speaker (Rieser & Poesio 2009). Interestingly, speaker-indicating pointings are typically not produced with an outstretched index finger, but with an open hand (an example is given in Figure 16 in Section 3.6). Thus, irrespective of the question whether grammar is inherently multimodal, dialogue theory has to deal at least with certain non-verbal interaction means in any case (see also Lücking, Ginzburg & Cooper 2021, Chapter 26 of this volume).

While there is ample evidence that at least some gestures contribute to the content of the utterance they co-occur with, does this also mean that they are part of the content *intended to be communicated*? A prominent counter-example is gesturing on the telephone (see Bavelas et al. 2008 for an overview of a number of respective studies). Since such gestures are not observable for the addressee, they cannot reasonably be taken to be a constituent of the content intended for communication. Rather, "telephone gestures" seem to be speaker-oriented, presumably facilitating word retrieval. The fact that it is difficult to suppress gesturing even in absence of an addressee speaks in favour of a multimodal nature if not of language, then at least of speaking and surely interacting. Furthermore, the lion's share of everyday gestures seems to consist of rather sloppy movements that do not contribute to the content of the utterance in any interesting sense, though they might signal other information like speaker states. In this sense they are contingent, as opposed to being an obligatory semantic component (Lücking 2013). Gestures (or some other demonstration act) can become obligatory when they are produced within the scope of a demonstrative expres-

### Andy Lücking

sion (recall (3)/Figure 2). A concurrent use with demonstratives is also one of the hallmarks collected by Cooperrider (2017) in order to distinguish *foreground* from *background* gestures (the other hallmarks are absence of speech, co-organisation with speaker gaze and speaker effort). This distinction reflects two traditions within gesture studies: according to one tradition most prominently bound up with the work of McNeill (1992), gesture is a *by-product* of speaking and therefore opens a "window into the speaker's mind". The other tradition, represented early on by Goodwin (2003) and Clark (1996), conceives gestures as a *product* of speaking, that is, as interaction means designed with a communicative intention. Since a gesture cannot be both a by-product and a product at the same time, as noted by Cooperrider (2017), a bifurcation that is rooted in the cause and the production process of the gesture has to be acknowledged (e.g. gesturing on the phone is only puzzling from the product view, but not from the by-product one). We will encounter this distinction again when briefly reviewing speech–gesture production models in Section 4.4. Gestures of both species are covered in the following.

# **2 Kinds of gestures**

Pointing at an object seems to be a different kind of gesture than mimicking drinking by moving a bent hand (i.e. virtually holding something) towards the mouth while slightly rotating the back of hand upwards. And both seem to be different from actions like scratching or nose-picking. On such grounds, gestures are usually assigned to one or more classes of a taxonomy of gesture classes. Gestures that fulfil a physiological need (such as scratching, nose-picking, footshaking or pen-fiddling) have been called *adaptors* (Ekman & Friesen 1969) and are not dealt with further here (but see Żywiczyński et al. 2017 for evidence that adaptors may be associated with turn transition points in dialogue). Gestures that have an intrinsic relation to speech and what is communicated have been called *regulators* and *illustrators* (Ekman & Friesen 1969) and cover a variety of gesture classes. These gesture classes are characterized by the function performed by a gesture and the meaning relation the gesture bears to its content. A classic taxonomy consists of the following inventory (McNeill 1992):

• iconic (or representational) gestures. Spontaneous hand and arm movements that are commonly said to be based on some kind of resemblance relation.<sup>3</sup> Iconic gestures employ a mode of representation such as *drawing*, *modelling*, *shaping* or *placing* (Streeck 2008; Müller 1998).

<sup>3</sup>But see footnote 9 in Section 3.5 for pointers to critical discussions of resemblance as a signbearing relation.

### 27 Gesture


### Andy Lücking

nary<sup>4</sup> ). Emblems may also be more local and collected within a dictionary like the dictionary of everyday gestures in Bulgaria (Kolarova 2011).

Reconsidering gestures that have been classified as beats, among other gestures, Bavelas et al. (1992) observed that many of the stroke movements accomplish functions beyond rhythmic structuring or emphasis. Rather, they appear to contribute to dialogue management and have been called *interactive gestures*. Therefore, these gestures should be added to the taxonomy:

• interactive gestures. Hand and arm movements that accomplish the function "of helping the interlocutors coordinate their dialogue" (Bavelas et al. 1995: 394). Interactive gestures include pointing gestures that serve turn allocation ("go ahead, it's your turn") and gestures that are bound up with speaker attitudes or the relationship between speaker and addressee. Examples can be found in 'open palm/palm upwards' gestures used to indicate the information status of a proposition ("as you know") or the mimicking of quotation marks in order to signal a report of direct speech (although this also has a clear iconic aspect).

The gesture classes should not be considered as mutually exclusive categories, but rather as dimensions according to which gestures can be defined, allowing for multi-dimensional cross-classifications (McNeill 2005; Gerwing & Bavelas 2004). For instance, it is possible to superimpose pointing gestures with iconic traits. This has been found in the study on pointing gestures described in Kranstedt et al. (2006a), where two participants at a time were involved in an identification game: one participant pointed at one of several parts of a toy airplane scattered over a table, the other participant had to identify the pointed object. When pointing at a disk (a wheel of the toy airplane), some participants used index palm down pointing, but additionally turned around their index finger in a circle – that is, the pointing gesture not only locates the disk (deictic dimension) but also depicted its shape (iconic dimension). See Özyürek (2012) for an overview of various gesture classification schemes.

In addition to classifying gestures according to the above-given functional groups, a further distinction is usually made with regard to the ontological place of their referent: representational and deictic gestures can relate to concrete or to abstract objects or scenes. For instance, an iconic drawing gesture can metaphorically display the notion "genre" via a conduit metaphor (McNeill 1992: 14):

<sup>4</sup>https://www.merriam-webster.com/dictionary/thumbs-up, lastly visited on 20th August 2018. The fact that emblems can be lexicalized in dictionaries emphasizes their special, conventional status among gestures.

27 Gesture

Figure 5: Gesture phases

(8) It [was a Sylves]ter and Tweety cartoon. *both hands rise up with open palm handshape, palms facing*; brackets indicate segments concurrent with the gesture stroke (see Figure 5).

The gesture in (8) virtually holds an object, thus depicting the abstract concept of the genre of being a Sylvester and Tweety cartoon as a bounded container. Accordingly, gestures can be cross-classified into *concrete* and *abstract* or *metaphorical* ones (see the volume of Cienki & Müller 2008 on gesture and metaphor).

On the most basic, kinematic level, the movement of a prototypical gesture follows an "anatomic triple": gestures have to be partitioned into at least a preparation, a stroke, and a retraction phase (Kendon 1972). The gesture phases are shown in the diagram in Figure 5. The stroke is the movement part that carries the gesture's meaning. It can be "frozen", leading to a post-stroke hold. If a stroke has to wait for its affiliated expression(s), a pre-stroke hold can also arise (Kita et al. 1998). The preparation and retraction phases bring hand and arms into and out of the stroke, respectively. Unless stated otherwise, when talking about gestures in what follows (and in hindsight concerning the examples given in Section 1), the stroke phase, which is the "gesture proper" or the "semantically interpretable" phase, is referred to.

Perhaps it should be noted that the spontaneous, usually co-verbal hand and arm movements considered in this chapter are different from the signed signs of sign languages and pantomime (neither spontaneous nor co-verbal).<sup>5</sup>

# **3 Gestures in HPSG**

Integrating a gesture's contribution into speech was initiated in computer science (Bolt 1980). Coincidentally, these early works used typed feature structure descriptions akin to the descriptive format used in HPSG grammars. Though linguistically limited, the crucial invention has been a *multimodal chart parser*, that

<sup>5</sup> In languages like German, the difference between free gesticulation and sign language signs is also reflected terminologically: the former are called *Gesten*, the latter *Gebärden*.

### Andy Lücking

is, an extension of chart parsing that allows the processing of input in two modalities (namely speech and gesture). Such approaches are reviewed in Section 3.2. Afterwards, a more elaborate gesture representation format is introduced that makes it possible to encode the observable form of a gesture in terms of kinematically derived attribute-value structures (Section 3.3). Following the basic semiotic distinction between deictic (or indicating or pointing) gestures and iconic (or representational or imagistic) gestures, the analysis of each class of gestures is exemplified in Sections 3.4 and 3.5, respectively. To begin with, however, some basic phenomena that should be covered by a multimodal grammar are briefly summarized in Section 3.1.

# **3.1 Basic empirical phenomena of grammatical gesture integration**

With regard to grammar-gesture integration, three main phenomena have to be dealt with:


Given the linguistic significance of gestures as sketched in the preceding sections, formal grammar- and semantics-oriented accounts of speech–gesture integration have recently been developed that try to deal with (at least one of) the three basic phenomena, though with different priorities, including Alahverdzhieva (2013), Alahverdzhieva & Lascarides (2010), Ebert 2014, Giorgolo (2010), Giorgolo & Asudeh (2011), Lücking (2013; 2016), Rieser (2008; 2011; 2015), Rieser & Poesio (2009) and Schlenker (2018). It should be noted that the first basic question does not have to be considered a question for grammar, but can be delegated to a foundational theory of gesture meaning. Here gestures turn out to be like words again, where "semantic theory" can refer to explaining meaning (foundational) or specifying meaning (descriptive) (Lewis 1970: 19). In any case, the HPSG-related approaches are briefly reviewed below.

# **3.2 Precursors**

Using typed feature structure descriptions to represent the form and meaning of gestures goes back to computer science approaches to human-computer interac-

### 27 Gesture

tion. For instance, the *QuickSet* system (Cohen et al. 1997) allows users to operate on a map and move objects or lay out barbed wires (the project was funded by a grant from the US army) by giving verbal commands and manually indicating coordinates. The system processes voice and pen (gesture) input by assigning signals from both media representations in the form of attribute-value matrices (AVMs) (Johnston 1998; Johnston et al. 1997). For instance, *QuickSet* will move a vehicle to a certain location on the map when asked to *Move this*[☞] *motorbike to here*[☞], where '☞' represents an occurrence of touch gesture (i.e. pen input).

Since a conventional constrained-based grammar for speech-only input rests on a "unimodal" parser, Johnston (1998) and Johnston et al. (1997) developed a *multimodal chart parser*, which is still a topic of computational linguistics (Alahverdzhieva et al. 2012) (see also Bender & Emerson 2021, Chapter 25 of this volume). A multimodal chart parser consists of two or more layers and allows for layer-crossing charts. The multimodal NP *this*[☞] *motorbike*, for instance, is processed in terms of a multimodal chart parser covering a speech (s) and a gesture (g) layer, as shown in Figure 6.

Figure 6: Illustration of a multimodal chart parser.

A multimodal chart or *multichart* is defined in terms of sets of identifiers from both layers. Possible multicharts from Figure 6 include the following ones:

(9) multichart 1: {[s,0,1], [g,3,4]} multichart 2: {[s,1,2], [g,3,4]}

…

The basic rule for integrating spatial gestures with speech commands is the *basic integration scheme* (Johnston 1998; Johnston et al. 1997), reproduced in (10):<sup>6</sup>

<sup>6</sup> In (10) the colon notation which is used by the authors of the quoted works is adopted.

Andy Lücking

 The AVM in (10) implements a mother-daughter structure along the lines of a context-free grammar rule, where a left-hand side (LHS) expands to a right-hand side (RHS). The right-hand side consists of two constituents (daughters DTR1 and DTR2), a verbal expression (*located\_command*) and a gesture. The semantic integration between both modalities is achieved in terms of structure sharing, see tag 5 : the spatial gesture provides the location coordinate for the verbal command.

 

The bimodal integration is constrained by a set of restrictions, mainly regulating the temporal relationship between speech and gesture (see tags 7 and 10 in the CONSTRAINTS set): the gesture may overlap with its affiliated word in time, or follow it in at most four seconds (see the 4s under CONSTRAINTS). An integration scheme highly akin to that displayed in (10) also underlies current grammar-oriented approaches to deictic and iconic gestures (see Sections 3.4 and 3.5 below).

# **3.3 Representing gestures with AVMs**

Representing the formal features of gestures in terms of attribute-value matrices has been initiated in robotics (Kopp et al. 2004). A representation format that captures the "phonological", physical-kinematic properties of a gesture is designed according to the moveable junctions of arms and hands. For instance, the representation of the gesture in Figure 3 according to the format used in Lücking et al. (2010) is given in (11):

27 Gesture

The formal description of a gestural movement is given in terms of the handshape, the orientations of the palm and the back of the hand (BOH), the movement trajectory (if any) of the wrist and the relation between both hands (synchronicity, SYNC). The handshape is drawn from the fingerspelling alphabet of American Sign Language, as illustrated in Figure 7. The orientations of palm and back of hand are specified with reference to the speaker's body (e.g. *PAB* encodes "palm away from body" and *BUP* encodes "back of hand upwards"). Movement features for the whole hand are specified with respect to the wrist: the starting position is given and the performed trajectory is encoded in terms of the described path and the direction and extent of the movement. Position and extent are given with reference to the *gesture space*, that is, the structured area within the speaker's immediate reach (McNeill 1992: 86–89) – see the left-hand side of Figure 8. Originally, McNeill considered the gesture space as "a shallow disk in front of the speaker, the bottom half flattened when the speaker is seated" (McNeill 1992: 86). However, also acknowledging the distance of the hand from the speaker's body (feature DIST) turns the shallow disk into a three-dimensional space, giving rise to the three-dimensional model displayed on the right-hand side of Figure 8. The gesture space regions known as *center-center*, *center* and *periphery*, possibly changed by location modifiers (*upper right*, *right*, *lower right*, *upper left*, *left*, *lower left*), are now modelled as nested cuboids. Thus, gesture space is structured acAndy Lücking

Figure 7: American Sign Language fingerspelling alphabet (image Public Domain by user Ds13 in the English Wikipedia on 18th December 2004, https: //commons.wikimedia.org/wiki/File:Asl\_alphabet\_gallaudet.png)

cording to all three body axes: the sagittal, the longitudinal and the transverse axes. Annotations straightforwardly transfer to the three-dimensional gesture space model. Such a three-dimensional gesture space model is assumed throughout this chapter. Complex movement trajectories through the vector space can describe a rectangular or a roundish path (or mixtures of both). Both kinds of movements are distinguished in terms of *line* or *arc* values of the feature PATH. An example illustrating the difference is given in Figure 9. A brief review of gesture annotation can be found in Section 4.1.

# **3.4 Pointing Gestures**

Pointing gestures are *the* prototypical referring device: they probably pave a way to reference in both evolutionary and language acquisition perspectives (Bruner

27 Gesture

Figure 8: Gesture Space (left hand side is simplified from McNeill 1992: 89). Although originally conceived as a structured "shallow disk" McNeill (1992: 86), adding distance information gives rise to a three-dimensional gesture space model as illustrated on the right-hand side.

$$\text{MR} \overbrace{\overbrace{\text{line}\qquad\text{line}}^{\text{MR}}}^{\text{MR}} \qquad\qquad\qquad \text{MF} \overbrace{\sqrt{\text{?} \qquad\text{?}}^{\text{MR}}}^{\text{MR}}$$

Figure 9: The same sequence of direction labels can give rise to an open rectangle or a semicircle, depending on the type of concatenation (Lücking 2016: 385).

1998; Masataka 2003; Matthews et al. 2012); they are predominant inhabitants of the "deictic level" of language, interleaving the symbolic (and the iconic) levels (Levinson 2006, see also Bühler 1934); they underlie reference in *Naming Games* in computer simulation approaches (Steels 1995) (for a semantic assessment of naming and categorisation games, see Lücking & Mehler 2012).

With regard to deictic gestures, Fricke (2012: Section 5.4) argues that deictic words within noun phrases – her prime example is German *so* 'like this' – provide a *structural*, that is, *language-systematic* integration point between the vocal plane of conventionalized words and the non-vocal plane of body movement. Therefore, in this conception, not only utterance production but *grammar* is inherently multimodal.

The referential import of the pointing gesture has been studied experimentally in some detail (Bangerter & Oppenheimer 2006; Kranstedt et al. 2006b,a; van der Sluis & Krahmer 2007). It turns out that pointings do not rely on a direct "laser" or "beam" mechanism (McGinn 1981). Rather, they serve a (more or less rough) locating function (Clark 1996) that can be modelled in terms of a *pointing cone*

### Andy Lücking

(Kranstedt et al. 2006b; Lücking et al. 2015). This work provides an answer to the first basic question (cf. Section 3.1): pointing gestures have a "spatial meaning" which focuses or highlights a region in relation to the direction of the pointing device. Such a spatial semantic model has been introduced in Rieser (2004) under the name of *region pointing*, where the gesture adds a locational constraint to the restrictor of a noun phrase. In a related way, two different functions of a pointing gesture have been distinguished by Kühnlein et al. (2002), namely singling out an object (12a) or making an object salient (12b).

(12) a. ( = ∧ ()) b. (*salient*() ∧ ())

The approach is expressed in lambda calculus and couched in an HPSG framework. The derivation of the instruction *Take the red bolt* plus a pointing gesture is exemplified in (13).

(13) Take [the &[N<sup>0</sup> [N<sup>0</sup> red bolt]]].

A pointing gesture is represented by means of "&" and takes a syntactic position within the linearized inputs according to the start of the stroke phase. For instance, the pointing gesture in (13) occurred after *the* has been articulated but before *red* is finished. The derivation of the multimodal N<sup>0</sup> constituent is shown in Figure 10.

The spatial model is also adopted in Lascarides & Stone (2009), where the region denoted by pointing is represented by a vector ®. This region is an argument to function , however, which maps the projected cone region to (®), the spacetime talked about, which may be different from the gesture space (many more puzzles of local deixis are collected by Klein 1978 and Fricke 2007).

Let us illustrate some aspects of pointing gesture integration by means of the real world example in (14) and Figure 11, taken from dialogue V5 of the SaGA corpus.

(14) Und and man[chmal sometimes ist is da there auch also ein an EISverkäufer]. ice.cream.guy 'And sometimes there's an ice cream guy'

The context in which the gesture appears is the following: the speaker describes a route which goes around a pond. He models the pond with his left hand, a poststroke hold (cf. Figure 5) held over several turns. After having drawn the route around the pond with his right hand, the pointing gesture in Figure 11is produced. The pointing indicates the location of an ice cream vendor in relation to the

27 Gesture

Figure 10: Derivation of the sample sentence *Take* [*the* &[N<sup>0</sup> [N<sup>0</sup> *red bolt*]]].

Figure 11: *Und man*[*chmal ist da auch ein EISverkäufer*] 'and sometimes there's an ice cream guy', [V5, 7:20]

pond modelled in gesture space. Such instances of indirect or proxy pointing have been interpreted as *dual points* by Goodwin (2003); in standard semantics they are analysed in terms of *deferred reference*, where one thing is indicated but another but related thing is referred to (Quine 1950; Nunberg 1993). The "duality" or "deference" involved in the datum consists of a mapping from the location indicated in gesture space onto a spatial area of the described real world situation. Such mappings are accounted for by the function that shifts the pointing cone

### Andy Lücking

area from gesture space ® to some other space (®) (Lascarides & Stone 2009). So, the deictic gesture locates the ice cream vendor. Since it is held during nearly the whole utterance, its affiliate expression *Eisverkäufer* 'ice cream guy' is picked out due to carrying primary accent (indicated by capitalization).<sup>7</sup> Within HPSG, such constraints can be formulated within an interface to metrical trees from the phonological model of Klein (2000) or phonetic information packing from Engdahl & Vallduví (1996) – see also De Kuthy (2021), Chapter 23 of this volume. The well-developed basic integration scheme of Alahverdzhieva et al. (2017: 445) rests on a strict speech and gesture overlap and is called the *Situated Prosodic Word Constraint*, which allows the combination of a speech daughter (S-DTR) and a gesture daughter (G-DTR) :

(15) Situated Prosodic Word Constraint (Alahverdzhieva et al. 2017: 445):

<sup>7</sup>Semantically, other integration points are possible, too, most notably with *da* 'there'. However, the intonation-based integration point patterns well with observations of the affiliation behaviour of iconic gestures, as indicated with respect to examples (5) and (6) in Section 1. Concerning deictic gestures, a constraint that favours affiliation to deictic words over affiliation to stressed words (if they differ at all) seems conceivable nonetheless.

### 27 Gesture

The Situated Prosodic Word Constraint applies to both deictic and iconic gestures. Under certain conditions, including when a deictic gesture is direct (i.e. ® = (®)), however, the temporal and prosodic constraints can be relaxed for pointings.

In order to deal with gestures that are affiliated with expressions that are larger than single words, Alahverdzhieva et al. (2017) also develop a phrase or sentence level integration scheme, where the stressed element has to be a semantic head (in the study of Mehler & Lücking 2012, 18.8% of the gestures had a phrasal affiliate). In this account, the affiliation problem (the second desideratum identified in Section 3.1) has a well-motivated solution on both the word and the phrasal levels, at least for temporally overlapping speech–gesture occurrences (modulo the conditioned relaxations for pointings). Semantic integration of gesture location and verbal meaning (the third basic question from Section 3.1) is brought about using the underspecification mechanism of *Robust Minimal Recursion Semantics* (RMRS), a refinement of *Minimal Recursion Semantics* (MRS) (Copestake et al. 2005), where basically scope as well as arity of elementary expressions is underspecified (Copestake 2007) – see the RELS and HCONS features in (15). For some background on (R)MRS see the above given references, or see Koenig & Richter (2021: Section 6.2), Chapter 22 of this volume.

A dialogue-oriented focus on pointing is taken in Lücking (2018): here, pointing gestures play a role in formulating processing instructions that guide the addressee in where to look for the referent of demonstrative noun phrases.

# **3.5 Iconic Gestures**

There is nearly no semantic work on the grounds according to which the meanings assigned to iconic gestures should be assigned to them in the first place (this is the first basic question from Section 3.1). Semantic modelling usually focuses on the interplay of (in this sense presumed) gesture content with speech content, that is, on the third of the basic questions from Section 3.1. Schlenker (2018: 296) is explicit in this respect: "It should be emphasized that we will not seek to explain how a gesture […] comes to have the content that it does, but just ask how this content interacts with the logical structure of a sentence".<sup>8</sup> Two exceptions, however, can be found in the approaches of Rieser (2010) and Lücking (2013; 2016). Rieser (2010) tries to extract a "depiction typology" out of a speechand-gesture corpus where formal gesture features are correlated with topological clusters consisting of geometrical constructs. Thus, he tries to address the first basic question from Section 3.1 in terms of an empirically extracted gesture ty-

<sup>8</sup>The omission indicated by "[…]" just contains a reference to an example in the quoted paper.

### Andy Lücking

pology. These geometrical objects are used in order to provide a possibly underspecified semantic representation for iconic gestures, which is then integrated into word meaning via lambda calculus (Hahn & Rieser 2010; Rieser 2011). The work of Lücking (2013; 2016) is inspired by Goodman's notion of *exemplification* (Goodman 1976), that is, iconic gestures are connected to semantic predicates in terms of a reversed denotation relation: the meaning of an iconic gesture is given in terms of the set of predicates which have the gesture event within their denotation. In order to make this approach work, common perceptual features for predicates are extracted from their denotation and represented as part of a lexical extension of their lexemes, serving as an interface between hand and arm movements and word meanings. This conception in turn is motivated by psychophysic theories of the perception of biological events (Johansson 1973), draws on philosophical similarity conceptions beyond isomorphic mappings (Peacocke 1987),<sup>9</sup> and, using a somewhat related approach, has been proven to work in robotics by means of imagistic description trees (Sowa 2006). These perceptual features serve as the integration locus for iconic gestures, using standard unification techniques. The integration scheme for achieving this is the following one (Lücking 2013: 249) (omitting the time constraint used in the basic integration scheme in (10)):

> 

$$\begin{aligned} \text{(16)} \quad & \begin{bmatrix} \text{sg-ensemble} \\ \text{PHON} & \boxed{\text{L}} \\ \text{CAT} & \boxed{\text{2}} \end{bmatrix} \begin{bmatrix} \text{P} \\ \text{RSTR} \left< \begin{array}{l} \text{RSTR} \left< \dots , \boxed{\text{4}} \left[ \text{pred} \left< \text{pred} \right> \right] , \dots \right> \right> \end{bmatrix} \\ \text{(16)} \quad & \begin{bmatrix} \text{verball-sign} \\ \text{PHON} \left[ \begin{subarray}{l} \text{[ACCENT [\text{\overline{6}}] \end{subarray}} \right] \\ \text{CAT} & \boxed{\text{2}} \\ \text{CONT [\overline{\text{3}}]} \end{bmatrix} \\ & \begin{bmatrix} \text{g-structure-vec} \\ \text{AFF} & \left< \begin{bmatrix} \text{PHON} \left[ \text{ACCENT [\overline{\text{6}}] \text{marked}} \right] \end{bmatrix} \right> \end{aligned} \end{aligned}$$

<sup>9</sup>That mere resemblance, usually associated with iconic signs, is too empty a notion to provide the basis for a signifying relation has been emphasized on various occasions (Burks 1949; Bierman 1962; Eco 1976; Goodman 1976; Sonesson 1998).

### 27 Gesture

Comparable to a modifier, a gesture attaches to an affiliate via feature AFF, which in turn is required to carry intonational accent, expressed in terms of information packaging developed by Engdahl & Vallduví 1996 (cf. De Kuthy 2021, Chapter 23 of this volume). The semantic contribution of a gesture is contributed via the new semantic mode *exemplification*, that is, a gesture displays a predication from the RESTR list of its affiliate. The exemplification interface is established using the format of vector semantics developed by Zwarts & Winter (2000) and Zwarts (2003) in order to capture the semantic contribution of locative prepositions, motion verbs and shape adjectives, among other things. This involves two steps: on the one hand, the representation of a gesture (cf. Section 3.3) is mapped onto a vectorial representation; on the other hand, the content of place and form predicates is enriched by abstract psychophysic information in the sense of Johansson (1973) (see above), also spelled out in terms of vector representations. Both steps are illustrated by means of the simple example shown in Figure 12, where the speaker produces a semicircle in both speech and gesture.

Figure 12: *und* [*oben haben die so'n HALBkreis*] 'and on the top they have such a semicircle' [V20, 6:36].

The kinematic gesture representation of the movement carried out (CARRIER) by the wrist – *move up*, *move left*, *move down*, which are concatenated ("⊕") by movement steps in a bent ("⊕⌣", as opposed to rectangular "⊕⌞") way (cf. also Figure 9) – is translated via a vectorising function **V** into a vector trajectory (TRAJ(ECTORY)) from the three-dimensional vector space, cf. Figure 8: 10

$$\begin{array}{c} \begin{bmatrix} \text{gesture-} \circ \text{vec} \\ \text{TRA} \end{bmatrix} \quad \begin{bmatrix} \mathbf{V}(\square) \mathbf{=} \mathbf{U} \mathbf{P} \oplus \square & \mathbf{-} \mathbf{R} \mathbf{T} \oplus \square & \mathbf{-} \mathbf{U} \mathbf{P} \end{bmatrix} \\\\ \begin{bmatrix} \text{gesture} \\ \text{MORPH} \end{bmatrix} \begin{bmatrix} \text{WRST} \| \mathbf{P} \mathbf{A} \mathbf{H} \oplus \square & \mathbf{m} \oplus \square & \mathbf{m} \mathbf{f} \oplus \square & \mathbf{m} \mathbf{f} \end{bmatrix} \end{array}$$

 

<sup>10</sup>Vectors (typeset in bold face) within gesture space can be conceived of as equivalence classes over concrete movement annotation predicates.

### Andy Lücking

The lexical entry for *semicircle* is endowed with a *conceptual vector meaning* attribute CVM. Within CVM it is specified (or underspecified) what kind of vector (VEC) is at stake (axis vector, shape vector, place vector), and how it looks, that is, which PATH it describes. A semicircle can be defined as an axis vector whose path is a 180° trajectory. Accordingly, 180° is the root of a type hierarchy which hosts all vector sequences within gesture space that describe a half circle. This information is added in terms of a form predicate to the restriction list of *semicircle*, as shown in the speech daughter's (S-DTR) content (CONT) value in (18). Licensed by the speech–gesture integration scheme in (16), the half-circular gesture trajectory from (17) and its affiliate expression *semicircle* can enter into an ensemble construction, as shown in (18):

> 

By extending lexical entries with frame information from Frame Semantics (Fillmore 1982), the exemplification of non-overtly-expressed predicates becomes feasible (Lücking 2013: Section 9.2.1); a datum showing this case has already been given with the *spiral staircases* in (5)/Figure 3. A highly improved version of the "vectorisation" of gestures with a translation protocol has been spelled out in Lücking (2016), but within the semantic framework of a *Type Theory with Records* (Cooper 2021; Cooper & Ginzburg 2015; cf. also Lücking, Ginzburg & Cooper 2021, Chapter 26 of this volume).

### 27 Gesture

The richer formal, functional and representational features of iconic gestures as compared to deictic gestures (Section 3.4) is accounted for in Alahverdzhieva et al. (2017) by assigning a formal predicate to each "phonological" feature of a gesture representation (cf. Section 3.3). These formal gesture predicates are highly underspecified, using *Robust Minimal Recursion Semantics* (RMRS) (Copestake 2007). That is, they can be assigned various predications (which are assumed to be constrained by iconicity with differing arity in the gesture resolution process).

Let us illustrate this by means of Example 1 from Alahverdzhieva et al. (2017: 422), re-given in (19) and adapted to the representational conventions followed in this chapter.

(19) [So he mixes MUD]

*The speaker performs a circular movement with the right hand over the upwards, open palm of the left hand*

Using a variant of a kinematic representation format for gestures (cf. Section 3.3), the right hand from example (19) is notated as follows (Alahverdzhieva et al. 2017: 440):


Each feature value pair from the gesture's representation in (20) is mapped onto an RMRS-based underspecified representation (Alahverdzhieva et al. 2017: 442):

(21) <sup>0</sup> : <sup>0</sup> : [G] (*ℎ*) 1 : 1 : *hand\_shape\_bent*(1) <sup>2</sup> : <sup>2</sup> : *palm\_orient\_towards\_down*(2) <sup>3</sup> : <sup>3</sup> : *finger\_orient\_towards\_down*(3) <sup>4</sup> : <sup>4</sup> : *hand\_location\_lower\_periphery*(4) <sup>5</sup> : <sup>5</sup> : *hand\_movement\_circular*(5) *ℎ* = *where* 1 ≤ ≤ 5

Note that all predicates mapped from the gesture in (21) fall within the scope of the scopal operator [G]; this prevents an individual introduced by a depicting gesture from being an antecedent of a pronoun in speech.

Regimented by the *Situated Prosodic Word Constraint* from (15), the underspecified semantic description of the gesture in (21) and its affiliated noun *mud* can

Andy Lücking

enter into the multimodal *visualising relation* (*vis\_rel*) construction given in Figure 13 (where the gesture features are partly omitted for the sake of brevity).

Figure 13: Derivation tree for depicting gesture and its affiliate noun *mud* (Alahverdzhieva et al. 2017: 447)

The underspecified RMRS predicates derived from gesture annotations are interpreted according to a type hierarchy rooted in those underspecified logical form features of gestures. For example, the circular hand movement of the "mud gesture" can give rise to two slightly different interpretations: on the one hand, the circular hand movement can depict – in the context of the example – that mud is being mixed from an observer viewpoint (McNeill 1992). This reading is achieved by following the left branch of Figure 14, where the gesture contributes

### 27 Gesture

a conjunction of predications that express that a substance rotates. When integrated with speech, the substance resolves to the mud and the rotating event to the mixing. On the other hand, the gesture can depict seen from the character viewpoint (McNeill 1992), which corresponds to the predication from the right branch of Figure 14. Here the rotating event is brought about by agent <sup>0</sup> which is required to be coreferential with *he*, the subject of the utterance.

> *hand\_movement\_circular*() *substance*( 0 ) ∧ *rotate*( 0 , 0 ) rotate( 0 , 0 , 0 )

Figure 14: The logical gestural form feature *hand\_movement\_circular*() can be expanded into two underspecified RRMS predications.

In addition to addressing (solving) the three basic questions identified in Section 3.1 – roughly, foundation of gesture meaning, regimenting affiliation, and characterisation of semantic integration – another issue has received attention recently, namely the projection behaviour of gestures when interacting with logical operators (Ebert 2014; Schlenker 2018). For instance, the unembedded gesture in (22) triggers the inference that the event being described actually happened in the manner in which it was gesticulated (Schlenker 2018: 303):

(22) John [*slapping gesture*] punished his son.

⇒ John punished his son by slapping him.

That is, (22) more or less corresponds to what semantic speech–gesture integration approaches, as briefly reviewed above, would derive as the content of the multimodal utterance.

Embedding the slapping gesture under the *none*-quantifier triggers, according to Schlenker (2018: 303), the following inference:

(23) None of these 10 guys [*slapping gesture*] punished his son. ⇒ For each of these 10 guys, if he had punished his son, this would have involved some slapping.

The universal inference patterns with presupposition. Unlike presupposition, however, Schlenker (2018: 303) claims that the inference is conditionalized on the at-issue contribution of (23), expressed by the *if* -clause. He then develops a notion of "cosupposition", which rests on an expression's local context that entails the content of its affiliated gesture. However, as Hunter (2019) argues,

### Andy Lücking

among others, conditional presuppositions just follow from general principles of dialogue coherence. So far, there is no connection from such projections to HPSG, however.

Beyond being involved in pragmatic processes like inferring, gestures also take part in "micro-evolutionary" developments. Iconic gestures in particular are involved in a short-term dynamic phenomenon: on repeated co-occurrence, iconic gestures and affiliated speech can fuse into a *multimodal ensemble* (Kendon 2004; Lücking et al. 2008; Mehler & Lücking 2012). The characteristic feature of such an ensemble is that their gestural part, their verbal part, or even both parts can be simplified without changing the meaning of the ensemble. Ensembles, thus, are the result of a process of sign formation as studied, for instance, in experimental semiotics (Galantucci & Garrod 2011). Such grammaticalisation processes eventually might lead to conventional signs. However, most conventional, emblematic everyday gestures seem to be the result of circumventing a taboo: something you should not name is gesticulated (Posner 2002).

# **3.6 Other gestures**

As noted in the taxonomy reviewed in Section 2, there are gestures that, unlike the deictic and iconic ones discussed in the previous sections, do not contribute to propositional content, but serve functions bound up with dialogue management. Such gestures have been called *interactive gestures* (Bavelas et al. 1992). Two examples are given in Figures 15 and 16, which have been discussed by Bavelas et al. (1995).

The "delivery gesture" in Figure 15 is used to underline an argument, or to refer to the fact that the current issue is known to the interlocutors. In the latter function, the gesture is also termed *shared information gesture*.

Figure 15: "Here's my point."

The 'open hand' pointing gesture in Figure 16 acts as a turn-taking device: it can function as a turn-assigning gesture (underlined by the caption of Figure 16), or, when used to point at the current speaker, it can also indicate that the gesturer wants to take the turn and address the current turn holder.

27 Gesture

Figure 16: "You go ahead."

So far there is no account of interactive gestures in HPSG. Given their entrenchment in dialogue processes, their natural home seems to be in a dialogue theory, anyway (see Lücking, Ginzburg & Cooper 2021, Chapter 26 of this volume). Accordingly, what is presumably the only formal approach to some of these gestures has been spelled out within the dialogical framework PTT in Rieser & Poesio (2009).

# **4 Gesture and …**

Besides being of a genuine linguistic, theoretical interest, gesture studies are a common topic in various areas of investigation, some of which are briefly pointed at below.

# **4.1 … tools, annotation, corpora**

Since gestures are signs in the visual modality, they have to be videotaped. Gesture annotation is carried out on the recorded video films. The main tools that allow for video annotation are, in alphabetical order, Anvil11, ELAN<sup>12</sup> and EX-MARaLDA<sup>13</sup> .

Annotation should follow an annotation standard which is specified in an annotation scheme. Various annotation schemes for gestures and speech–gesture integration have been proposed, partly differing in annotation foci, including the following ones: annotation schemes that focus on form description and gestures classification in terms of a taxonomy like the one introduced in Section 2 have been developed by R. Breckenridge Church, published in the appendix of McNeill (1992); CoGEST (Gibbon et al. 2003); FORM (Martell et al. 2002) and the SaGA annotation (Lücking et al. 2013). The form of gestures and their timing with speech

<sup>11</sup>https://www.anvil-software.org/ (Kipp 2014).

<sup>12</sup>https://tla.mpi.nl/tools/tla-tools/elan/, Max Planck Institute for Psycholinguistics, The Language Archive, Nijmegen, The Netherlands (Sloetjes & Wittenburg 2008).

<sup>13</sup>https://exmaralda.org/ (Schmidt 2012).

### Andy Lücking

is the object of the coding scheme of Kipp et al. (2007). An interaction-oriented scheme has been proposed by Allwood et al. (2007), which is formulated on the level of turns and dialogue management. A detailed annotation scheme for the form and function of gestures has been developed in terms of "annotation decision trees" within the NEUROGES system (Lausberg & Sloetjes 2009).

Annotated videos of real life interactions give rise to so-called multimodal corpora. Among those that include data on gestures are the following ones. The multimodal SmartKom Corpus (Schiel et al. 2003), which grew out of the SmartKom project (Wahlster 2006), comprises recording sessions of various Wizard-of-Oz experiments (that is, human-computer interaction where the human participant is made to believe that the system she or he interacts with is autonomous while in fact it is, at least partly, operated by another human). Recordings are extended basically by a transliteration and labelling of natural speech, labelling of gestures and annotation of user states (in the corpus' first release). The first public release, SKP 1.0, contains 90 recording sessions of 45 users. The multimodal SmartKom corpus as well as further SmartKom resources are hosted at the *Bavarian Archive for Speech Signals* (https://www.bas.uni-muenchen.de/Bas/).

The AMI Meeting Corpus (Carletta et al. 2006) consists of 100 hours of meeting recordings. The meetings were recorded in English but include mostly nonnative speakers. The AMI Meeting Corpus provides orthographic transcriptions, but also has a couple of further annotations, including dialogue acts, named entities, head gesture, hand gesture, gaze direction, movement and emotional states.

The SaGA ("Speech and Gesture Alignment") corpus consists of 24 German route direction dialogues obtained after a bus ride through a virtual town (Lücking et al. 2010). Audio and video data from the direction-giver were recorded. The SaGA corpus consists of 280 minutes of video material containing 4,961 iconic/ deictic gestures, approximately 1,000 discourse gestures and 39,435 word tokens (Lücking et al. 2013). Gesture annotation has been carried out in great detail, following a kinematic, form-based approach (cf. the above remark on annotation schemes). Part of the SaGA corpus is available from the *Bavarian Archive for Speech Signals* (https://www.bas.uni-muenchen.de/Bas).

The DUEL ("Disfluency, exclamations and laughter in dialogue") corpus (Hough et al. 2016) comprises 24 hours of natural, face-to-face dialogue in German, French and Mandarin Chinese. It includes audio, video and body tracking data and is transcribed and annotated for disfluency, laughter and exclamations.

The FIGURE (derived from "Frankfurt Image GestURE") corpus (Lücking et al. 2016) is built on recordings of 50 participants with various mother tongues (though mostly German) spontaneously producing gestures in response to five or six terms from a total of 27 stimulus terms, which have been compiled mainly

### 27 Gesture

from image schemata (Lakoff 1987: 267). The gestures have been kinematically annotated by means of a variant of the SaGA annotation scheme. The FIGURE annotation is available from the Text Technology Lab Frankfurt (https://www. texttechnologylab.org/applications/corpora).

# **4.2 … robots and virtual agents**

In the context of Human-Computer Interaction (HCI) or Human-Robot Interaction (HRI), gesture plays an important role (in fact, the formal modelling of deictic and iconic gestures has been initiated in these fields, cf. Section 3.2). One reason for this prominence of gesture in technical areas is that people who interact with a robot evaluate it more positively when the robot displays non-verbal behaviours such as hand and arm gestures along with speech (see e.g. Salem et al. 2012). Within HCI/HRI, two kinds of distinctions have to be made. The first is a distinction between "robot" in the sense of virtual avatars and "robot" in the (probably more common) sense of physical devices (only the latter will be henceforth called a "robot"). The second distinction discerns gesture generation from gesture recognition. Given this simple systematization, altogether four divisions of gesture and virtual avatars/robots arise (references are just exemplary and preferably from earlier HCI/HRI times): (i) gesture generation by robots (e.g. Le et al. 2011); (ii) gesture recognition by robots (e.g. Triesch & von der Malsburg 1998); (iii) gesture generation by virtual avatars (e.g. Cassell et al. 2000); and (iv) gesture recognition in VR/AR (e.g. Weissmann & Salomon 1999). For a more detailed overview see Lücking & Pfeiffer (2012). Enabling humans to act and interact in virtual rooms (e.g. Pfeiffer et al. 2018) can be seen as a recent extension of gesture use in HCI/HRI.

In order to plan and design the speech/gesture output of a virtual avatar or a robot, a multimodal representation format is required. To this end, the *Multimodal Utterances Representation Markup Language* for conversational robots (MURML) has been developed (Kranstedt et al. 2002). A similar purpose is served by the *Extensible MultiModal Annotation* (EMMA; Johnston 2009).

# **4.3 … learning**

Following a "gesture as a window to the mind" view, gestures must be a prime object of educational theory and practice, and they are indeed, as demonstrated by research of Cook & Goldin-Meadow (2006) and colleagues. Effectiveness of gestures has been studied in math lessons (Goldin-Meadow et al. 2001), in the acquisition of counting competence (Alibali & DiRusso 1999) and in bilingual education (Breckinridge Church et al. 2004), among other areas. The fairly unan-

### Andy Lücking

imous result is that gestures can indeed reflect students' conceptualisations and provide insights into cognitive processes involved in learning. Therefore, they can be used as a teaching device as well as an indicator of learning progress and understanding.

# **4.4 … aphasia**

Current models of utterance production are speech–gesture production models, assuming a (more or less) integrated generation of multimodal utterances. Based on such models, one expects an effect on gesture performance when speech production is impaired, as is the case with aphasic speakers. Aphasia is an acquired speech disorder, which can be caused by a stroke, ischaemia, haemorrhage, craniocerebral trauma and other brain-damaging diseases. Different speech–gesture production models make slightly different predictions for speakers suffering from aphasia and can be evaluated accordingly (de Ruiter & de Beer 2013). Indeed, observing the gesture behaviour of aphasic speakers is one aspect of gesture and aphasia research (Jakob et al. 2011; Kong et al. 2017; Sekine & Rose 2013). With the exception of the growth point theory, speech–gesture production models are based on Levelt's (1989) model.

The *Growth Point model* (McNeill & Duncan 2000) assumes that the "seed" of an utterance is an inherently multimodal idea unit that comprises imagistic as well as symbolic proto-representations which unfold into gesture and speech respectively in the process of articulation (see also Röpke 2011 on the growth point's entrenchment in contexts and frames).

The *Sketch model* (de Ruiter 2000) reflects explicitly different kinds of gestures (see Section 2). Its name is due to the sketch component, an abstract spatiotemporal representation alongside Levelt's preverbal message. Independently from each other, the sketch is sent to a gesture planner, while the preverbal message is processed by the formulator.

According to the *Lexical Access model* of Krauss et al. (2000), iconic gestures are related to words and are used in order to facilitate speaker-internal word retrieval rather than communicating pictorial information.

The *Interface model* (Kita & Özyürek 2003) assumes that the processes for speech and gesture generation negotiate with each other and therefore can influence each other during the production phase.

Other aspects include the use of gesture in speech therapy. Very much in line with the lexical access model, gestures have been used in order to facilitate word retrieval in what can be called *multimodal therapy* (Rose 2006). Following a different strategy, gestures are also used in order to enhance the communicative range of patients: they learn to employ gestures instead of words in order to

27 Gesture

communicate at least some of their needs and thoughts more fluently (Cubelli et al. 1991; Caute et al. 2013).

However, just counting on gestures in therapy does not automatically lead to success (Auer & Bauer 2011). The type and severity of aphasia, the individual traits of the aphasic speaker and the kinds of gestures impaired or still at her or his disposal, among other factors, seem to constitute a complex network for which currently no generally applicable clinical pathway can be given.

# **5 Outlook**

What are (still) challenging issues with respect to grammar-gesture integration, in particular from a semantic point of view? Candidates include:


Finally, the empirical domain of "gesture" has to be extended to other nonverbal signals, in particular propositional ones such as laughter (Ginzburg et al. 2015), facial expressions or gaze (see Section 1 for a brief list of non-verbal signals), in isolation as well as in mutual combinations. Thus, there is still some way to go in order to achieve a fuller understanding of natural language interactions and thereby natural languages.

# **Acknowledgments**

This work is partially supported by a public grant overseen by the French National Research Agency (ANR) as part of the program "Investissements d'Avenir"

### Andy Lücking

(reference: ANR-10-LABX-0083). It contributes to the IdEx Université de Paris – ANR-18-IDEX-0001. For insightful comments on earlier drafts I want to thank Anne Abeillé, Jonathan Ginzburg, Alex Lascarides, Stefan Müller, Hannes Rieser, and Markus Steinbach. They helped to improve the chapter a lot. Needless to say that all remaining oddities or shortcomings are my own. Furthermore, I am grateful to Elizabeth Pankratz for attentive remarks and for proofreading.

# **References**


### 27 Gesture

Argyle, Michael. 1975. *Bodily communication*. New York, NY: Methuen & Co.


### Andy Lücking

(Contributions to the Sociology of Language 25), 101–117. The Hague, The Netherlands: Mouton. DOI: 10.1515/9783110813098.101.


### 27 Gesture

Foley & Jim Hollan (eds.), *Proceedings of the fifth ACM international conference on multimedia (multimedia '97)*, 31–40. Seattle, WA: Association for Computing Machinery. DOI: 10.1145/266180.266328.


### Andy Lücking

Cambridge, UK: Cambridge University Press. DOI: 10.1017/CBO9780511620850. 018.


### 27 Gesture


### Andy Lücking


#### 27 Gesture

*computational linguistics*, vol. 1 (ACL '98), 624–630. Montreal, Quebec, Canada: Association for Computational Linguistics. DOI: 10.3115/980845.980949.


### Andy Lücking


### 27 Gesture

New York, NY: Association for Computing Machinery. DOI: 10.1145/1027933. 1027952.


### Andy Lücking


### 27 Gesture

(eds.), *Head-Driven Phrase Structure Grammar: The handbook* (Empirically Oriented Theoretical Morphology and Syntax), 1155–1199. Berlin: Language Science Press. DOI: 10.5281/zenodo.5599870.


### Andy Lücking

*Where language, culture, and cognition meet*, 69–84. Mahwah, NJ: Lawrence Erlbaum Associates, Inc.


27 Gesture

Communication Science (HSK) 38). Berlin: De Gruyter Mouton. DOI: 10.1515/ 9783110261318.


### Andy Lücking

*ics of dialogue* (Catalog '04), 93–100. Barcelona: Department of Translation & Philology Universitat Pompeu Fabra, Barcelona.


27 Gesture


### Andy Lücking


# **Part V The broader picture**

# **Chapter 28**

# **HPSG and Minimalism**

# Robert D. Borsley

University of Essex and Bangor University

# Stefan Müller

Humboldt-Universität zu Berlin

This chapter compares work done in Head-Driven Phrase Structure Grammar with work done under the heading *Minimalist Program*. We discuss differences in the respective approaches and the outlook of the theories. We have a look at the procedural/constraint-based views on grammar and discuss the differences in complexity of the structures that are assumed. We also address psycholinguistic issues like processing and language acquisition.

# **1 Introduction**

The Minimalist framework, which was first outlined by Chomsky in the early 1990s (Chomsky 1993; 1995b), still seems to be the dominant approach in theoretical syntax. It is important, therefore, to consider how HPSG compares with this framework. In a sense, both frameworks are descendants of the transformationgenerative approach to syntax, which Chomsky introduced in the 1950s. HPSG is a result of the questioning of transformational analyses<sup>1</sup> that emerged in the late 1970s. This led to Lexical Functional Grammar (Bresnan & Kaplan 1982) and Generalized Phrase Structure Grammar (Gazdar, Klein, Pullum & Sag 1985), and then in the mid-1980s to HPSG (Pollard & Sag 1987; see Flickinger, Pollard & Wasow

<sup>1</sup>By *transformational analyses* we mean analyses which derive structures from structures, especially by movement, whether the movement is the product of transformational rules, a general license to move, or the Internal Merge mechanism.

Robert D. Borsley & Stefan Müller. 2021. HPSG and Minimalism. In Stefan Müller, Anne Abeillé, Robert D. Borsley & Jean- Pierre Koenig (eds.), *Head-Driven Phrase Structure Grammar: The handbook*, 1253–1329. Berlin: Language Science Press. DOI: 10.5281/zenodo.5599874

#### Robert D. Borsley & Stefan Müller

2021, Chapter 2 of this volume for more on the origins of HPSG).<sup>2</sup> Minimalism in contrast remains committed to transformational, i.e., movement, analyses. It is simpler in some respects than the earlier Government & Binding framework (Chomsky 1981), but as we will see below, it involves a variety of complexities.

The relation between the two frameworks is clouded by the discourse that surrounds Minimalism. At one time "virtual conceptual necessity" was said to be its guiding principle. A little later, it was said to be concerned with the "perfection of language", with "how closely human language approaches an optimal solution to design conditions that the system must meet to be usable at all" (Chomsky 2002: 58). Much of this discourse seems designed to suggest that Minimalism is quite different from other approaches and should not be assessed in the same way. In the words of Postal (2003: 19), it looks like "an attempt to provide certain views with a sort of privileged status, with the goal of placing them at least rhetorically beyond the demands of serious argument or evidence". However, the two frameworks have enough in common to allow meaningful comparisons.

Both frameworks seek to provide an account of what is and is not possible both in specific languages and in language in general. Moreover, both are concerned not just with local relations such as that between a head and its complement or complements, but also with non-local relations such as those in the following:

	- b. It seems to be raining.
	- c. Which student do you think knows the answer?

In (1a), *the student* is subject of *knows* and is responsible for the fact that *knows* is a third person singular form, but *the student* and *knows* are not sisters if *knows* and *the answer* form a VP. In (1b) the subject is *it* because the complement of *be* is *raining* and *raining* requires an expletive subject, but *it* and *raining* are obviously not sisters. Finally, in (1c), *which student* is understood as the subject of *knows* and is responsible for the fact that it is third person singular, but again the two elements are structurally quite far apart. Both frameworks provide analyses

<sup>2</sup>We make no attempt to provide an introduction to HPSG in this chapter. For an introduction to the various aspects of the framework, see the other chapters of this handbook. For example, non-transformational analyses of the passive are dealt with in Davis, Koenig & Wechsler (2021: Section 5.3), Chapter 9 of this volume, constituent order in Müller (2021b), Chapter 10 of this volume, and unbounded dependencies in Borsley & Crysmann (2021), Chapter 13 of this volume. The question of whether scrambling, passive, and nonlocal dependencies should be handled by the same mechanism (e.g., transformations) or whether these phenomena are distinct and should be analyzed by making use of different mechanisms is discussed in Müller (2020: Chapter 20).

28 HPSG and Minimalism

for these and other central syntactic phenomena, and it is quite reasonable to compare them and ask which is the more satisfactory.<sup>3</sup>

Although HPSG and Minimalism have enough in common to permit comparisons, there are obviously many differences. Some are more important than others, and some relate to the basic approach and outlook, while others concern the nature of grammatical systems and syntactic structures. In this chapter we will explore the full range of differences.

The chapter is organized as follows: in Section 2, we look at differences of approach between the two frameworks. Then in Section 3, we consider the quite different views of grammar that the two frameworks espouse, and in Section 4, we look at the very different syntactic structures which result. Finally, in Section 5, we consider how the two frameworks relate to psycholinguistic issues, especially processing and language acquisition.

# **2 Differences of approach and outlook**

This section deals with some higher level differences between the two frameworks. We start with the degree of formalization and the range of data that is covered (Section 2.1). Section 2.2 discusses the quality of empirical work. Finally, Section 2.3 deals with arguments for invisible entities and innate knowledge.

# **2.1 Formalization and exhaustivity**

As many of the chapters in this volume emphasize, HPSG is a framework which places considerable emphasis on detailed formal analyses of the kind that one might expect within Generative Grammar.<sup>4</sup> Thus, it is not uncommon to find lengthy appendices setting out formal analyses. See, for example, Sag's (1997) paper on English relative clauses, Van Eynde's (2015) book on predicative con-

<sup>3</sup>As noted below, comparison is complicated somewhat by the fact that Minimalists typically provide only sketches of analyses in which various details are left quite vague.

<sup>4</sup>We follow Ginzburg & Sag (2000: 2) in counting HPSG among Generative Grammar in the sense defined by Chomsky (1965: 4), namely as a framework that provides an explicit characterization of the theories developed within it. When we refer to work in Government & Binding or Minimalism, we follow Culicover & Jackendoff (2005: 3) in using the term *Mainstream Generative Grammar* (MGG). It should be kept in mind that there is another meaning associated with the term *generative*. A generative grammar in the latter sense generates a set (Chomsky 1957: 13). HPSG is not generative in this sense but rather model-theoretic. See Pullum & Scholz (2001) for differences between generative-enumerative and model-theoretic approaches. See also Richter (2021), Chapter 3 of this volume and Wasow (2021: Section 3.1), Chapter 24 of this volume.

#### Robert D. Borsley & Stefan Müller

structions, and especially Ginzburg & Sag (2000), which has a 50-page appendix. One consequence of this is that HPSG has had considerable influence in computational linguistics. Sometimes theoretical work comes paired with computer implementations, which show that the analyses are consistent and complete, e.g., all publications coming out of the CoreGram project (Müller 2015c) and the HPSG textbook for German that comes with implementations corresponding to the individual chapters of the book (Müller 2007b). It has been noticed both by theoretical linguists (Bierwisch 1963: 163) and by theoretically-oriented computational linguists (Abney 1996: 20) that the interaction of phenomena is so complex that most normal human beings cannot deal with this complexity, and formalization and implementation actually helps enormously to understand language in its full depth. For more on the relation of HPSG and computational linguistics, see Bender & Emerson (2021), Chapter 25 of this volume.

In Minimalism things are very different. Detailed formal analyses are virtually non-existent. There appear to be no appendices like those in Sag (1997) and Ginzburg & Sag (2000). In fact, the importance of formalization has long been downplayed in Chomskyan work, e.g., by Chomsky in an interview with Huybregts & Riemsdijk (1982: 73) and in discussions between Pullum (1989) and Chomsky (1990: 146), and this view seems fairly standard within Minimalism; see also the discussion in Müller (2020: Section 3.6.2). Chomsky & Lasnik (1995: 28) attempt to justify the absence of detailed analyses when they suggest that providing a rule system from which some set of phenomena can be derived is not "a real result" since "it is often possible to devise one that will more or less work". Instead, they say, "the task is now to show how the phenomena […] can be deduced from the invariant principles of UG [Universal Grammar] with parameters set in one of the permissible ways". Postal (2004: 5) comments that what we see here is the "notion that descriptive success is not really that hard and so not of much importance". He points out that if this were true, one would expect successful descriptions to be abundant within transformational frameworks. He argues that actual transformational descriptions are quite poor, and justifies this assessment with detailed discussions of Chomskyan work on strong crossover phenomena and passives in Chapters 7 and 8 of his book.

There has also been a strong tendency within Minimalism to focus on just a subset of the facts in whatever domain is being investigated. As Culicover & Jackendoff (2005: 535) note, "much of the fine detail of traditional constructions has ceased to garner attention". This tendency has sometimes been buttressed by a distinction between core grammar, which is supposedly a fairly straightforward reflection of the language faculty, and a periphery of marked constructions, which are of no great importance and which can reasonably be ignored. However,

#### 28 HPSG and Minimalism

as Culicover (1999) and others have argued, there is no evidence for a clear cut distinction between core and periphery. It follows that a satisfactory approach to grammar needs to account both for such core phenomena as *wh*-interrogatives, relative clauses, and passives and also for more peripheral phenomena such as the following:

	- b. The more I read, the more I understand.
	- c. Chris lied his way into the meeting.

These exemplify the nominal extraposition construction (Michaelis & Lambrecht 1996), the comparative correlative construction (Culicover & Jackendoff 1999; Borsley 2011), and the *X's Way* construction (Salkoff 1988; Sag 2012). As has been emphasized in other chapters, the HPSG system of types and constraints is able to accommodate broad linguistic generalizations, highly idiosyncratic facts, and everything in between.

The general absence in Minimalism of detailed formal analyses is quite important. It means that Minimalists may not be fully aware of the complexity of the structures they are committed to, and this allows them to sidestep the question of whether this complexity is really justified. It also allows them to avoid the question of whether the very simple conception of grammar that they favour is really satisfactory. Finally, it may be that they are unaware of how many phenomena remain unaccounted for. These are all important matters.

The general absence of detailed formal analyses has also led to Minimalism having little impact on computational linguistics. There has been some work that has sought to implement Minimalist ideas (Stabler 2001; Fong & Ginsburg 2012; Fong 2014; Torr 2019), but Minimalism has not had anything like the productive relation with computational work that HPSG has enjoyed (see Bender & Emerson 2021, Chapter 25 of this volume). Existing Minimalist implementations are, rather, toy grammars analyzing very simple sentences; some are not faithful to the theories they are claimed to be implementing,<sup>5</sup> and some do not even parse natural language but require pre-segmented, pre-formatted input. For example, Stabler's test sentences have the form as in (3).

<sup>5</sup>Fong's grammars are simple Definite Clause Grammars, that is, context-free phrase structure grammars, and hence nowhere near an implementation of Minimalism, contrary to claims by Berwick, Pietroski, Yankama & Chomsky (2011: 1221). Lin's parsers PrinciPar and MiniPar (1993; 2003) are based on GB and Minimalism but according to Lin (1993: 116) and Torr et al. (2019: 2487), they are not transformational but use a SLASH passing mechanism like the one developed in GPSG (Gazdar 1981) and standardly used in HPSG (see Borsley & Crysmann 2021, Chapter 13 of this volume).

#### Robert D. Borsley & Stefan Müller

	- b. the king have -s eat -en
	- c. the king be -s eat -ing
	- d. the king -s will -s have been eat -ing the pie

See Müller (2020: Section 4.7.2) for discussion. Torr implemented a large-scale grammar (Torr, Stanojevic, Steedman & Cohen 2019: 2487; Torr 2019), but he also uses a SLASH passing mechanism and "around 45" versions of Move and Merge (Torr et al. 2019: 2488) in comparison to the two versions usually assumed in Minimalism (Move and Merge, or Internal and External Merge). Torr's work cannot be discussed here in detail due to space limitations, but is discussed in more detail in Müller (2020: 177–180). Müller shows that Torr's MG derivations are equivalent to an HPSG analysis assuming Reape-style discontinuous constituents (Reape 1994; Müller 2021b: Section 6, Chapter 10 of this volume) and a SLASH passing mechanism.

Summing up: the fact that certain variants of Minimalism share properties with Categorial Grammar has been noticed early on (Berwick & Epstein 1995). Directional Minimalist Grammars were compared to CG and HPSG by Müller (2013: Section 2.3). Minimalist Grammars (MGs) were extended to include GPSGstyle SLASH passing mechanisms by Kobele (2008) and they continue to use SLASH passing in the versions of Torr & Stabler (2016) and Torr (2019). We believe that this work is fruitful and well-formalized, but formalization is insufficient for most of the work in Minimalism, and ideas from other frameworks are more often than not ignored.

# **2.2 Empirical quality**

There are, then, issues surrounding the quantity of data that is considered in Minimalist work. There are also issues surrounding its quality (Schütze 2016). Research in HPSG is typically quite careful about data and often makes use of corpus and experimental data (see for example An & Abeillé 2017; Müller 1999b; 2002; Bildhauer & Cook 2010; Müller, Bildhauer & Cook 2012; Chaves 2013; Miller 2013; Van Eynde 2015: Chapter 7; Abeillé et al. 2016; Shiraïshi et al. 2019; Winckel 2020 for examples of work with attested examples and for experimental work). This use of corpus data and attested examples is based on the insight that introspection alone is not sufficient, given that an enormous amount of time is spent on working out analyses, and it would be unfortunate if these analyses were built on a shaky empirical basis. See Müller (2007a) and Meurers & Müller (2009) for the discussion of introspection vs. corpus data and Hofmeister & Sag (2010) and Gibson & Fedorenko (2013) for the discussion of introspection vs. controlled

#### 28 HPSG and Minimalism

experimental data. Research in Minimalism is often rather less careful.<sup>6</sup> In a review of a collection of Minimalist papers, Bender (2002: 434) comments that: "In these papers, the data appears to be collected in an off-hand, unsystematic way, with unconfirmed questionable judgments often used at crucial points in the argumentation". She goes on to suggest that the framework encourages "lack of concern for the data, above and beyond what is unfortunately already the norm in formal syntax, because the connection between analysis and data is allowed to be remote". Similar things could be said about a variety of Minimalist work. Consider, for example, Aoun & Li (2003), who argue for quite different analyses of *that*-relatives and *wh*-relatives on the basis of the following (supposed) contrasts, which appear to represent nothing more than their own judgements (p. 110–112):

	- b. ?? The headway which Mel made was impressive.
	- b. \* We admired the picture of himself which John painted in art class.

None of the native speakers we have consulted find significant contrasts here which could support different analyses. The example in (7a) with a *which* relative clause referring to *headway* can be found in Cole et al. (1982). Williams (1989: 437) and Falk (2010: 221) have examples with a reflexive coreferential with a noun in a relative clause introduced by *which* as in William's (7b), and corpus examples like (7c, d) can be found as well:

	- b. the picture of himself which John took
	- c. The words had the effect of lending an additional clarity and firmness of outline to the picture of himself which Bill had already drawn in his mind–of a soulless creature sunk in hoggish slumber.<sup>7</sup>

<sup>6</sup>We hasten to say that we do not claim this to be true for all Minimalist work. There are researchers working with corpora or at least with attested examples (Wurmbrand 2003), and there is experimental work. Especially in Germany there were several large scale Collaborative Research Centers with a strong empirical focus which also fed back into theoretical work, including Minimalist work. The fact that we point out here is that there is work, including work by prominent Minimalists, that is rather sloppy as far as data is concerned.

<sup>7</sup>Wodehouse, P.G. 1917. *Uneasy Money*, London: Methuen & Co., p. 186,

http://www.literaturepage.com/read.php?titleid=uneasymoney&abspage=186, 2021-02-01.

#### Robert D. Borsley & Stefan Müller

d. She refused to allow the picture of himself, which he had sent her, to be hung, and it was reported that she ordered all her portraits and busts of him to be put in the lumber attics.<sup>8</sup>

Given that it is relatively easy to come up with counterexamples, it is surprising that authors do not do a quick check before working out rather complex analyses.

Note that we are not just taking one bad example of Minimalist work. It is probably the case that papers with dubious judgments can be found in any framework, if only due to the repetitions of unwarranted claims made by others. The point is that Aoun & Li are influential (quoted by 534 other publications as of February 2, 2021). Others rely on these judgments or the analyses that were motivated by them. New conclusions are derived from analyses, since theories make predictions. If this process continues for a while, an elaborate theoretical edifice results that is not empirically supported. Note furthermore that the criticism raised here is not the squabble of two authors working in an alternative framework. This criticism also comes from practitioners of Mainstream Generative Grammar. For example, Wolfgang Sternefeld and Hubert Haider, both very prominent figures in the German Generative Grammar school, criticized the scientific standards in Minimalism heavily (Sternefeld & Richter 2012; Haider 2018).

As we will show in Section 3.4, Minimalist discussions of the important topic of labelling have also been marred by a failure to take relevant data into account.

# **2.3 Argumentation for invisible entities and the assumption of innate linguistic knowledge**

There are also differences in the kind of arguments that the two frameworks find acceptable. It is common within Minimalism to assume that some phenomenon which cannot be readily observed in some languages must be part of their grammatical system because it is clearly present in other languages. Notable examples would be case (Li 2008) or (object) agreement (Meinunger 2000: Chapter 4), which are assumed to play a role even though there are no visible manifestations within some languages (e.g., Mandarin Chinese and German, respectively). This stems from the longstanding Chomskyan assumption that language is the realization of a complex innate language faculty. From this perspective, there is much in any grammatical system that is a reflection of the language faculty and not in any simple way a reflection of the observable phenomena of the language

<sup>8</sup> Jerrold, Clare. 1913. *The married life of Queen Victoria*, London: G. Bells & Sons, Ltd. https://archive.org/stream/marriedlifeofque00jerruoft/marriedlifeofque00jerruoft\_djvu.txt, 2021-02-01.

#### 28 HPSG and Minimalism

in question. If some phenomenon plays an important role in many languages, it is viewed as a reflection of the language faculty, and hence it must be a feature of all grammatical systems, even those in which any evidence for it is hard to see. An example – taken from a textbook on Minimalism (Hornstein, Nunes & Grohmann 2005: 124) – is an analysis of prepositional phrases in English. Figure 1 shows the analysis.<sup>9</sup> Due to theory-internal assumptions, the case requirement

# Figure 1: Minimalist analysis of a PP according to Hornstein, Nunes & Grohmann (2005: 124) and the analysis assumed in HPSG and all other phrasestructure-based frameworks

of the preposition cannot be checked in the P-DP combination. According to the version of the theory adopted by the authors, case has to be checked in specifier positions. Therefore it was assumed that the preposition moves to an Agr head and the DP moves to the specifier position of this Agr head. The problem is, of course, that DP and P are in the wrong order now. However, the authors argue that this is the order that is manifested in Hungarian, and that Hungarian is a language which has postpositions, and these agree with their nominal dependent.

<sup>9</sup>This analysis is actually a much simpler variant of the PP analysis which appeared in an earlier textbook by Radford (1997: 452). For discussion of this analysis, see Sternefeld (2006: 549–550) and Müller (2016: Section 4.6.1.2). We are aware of the fact that Minimalism developed further since 1997 and 2005 and that some Agr projections are replaced by other mechanisms, but first, this is not true for all analyses (see for example Carnie 2013), and second, the way analyses are argued for did not change.

#### Robert D. Borsley & Stefan Müller

The authors assume that Hungarian postpositions are prepositions underlyingly and that the DP following the preposition moves to the left because of a movement process that is triggered by agreement. It is claimed that this movement exists both in Hungarian and in English but that the movement is covert (that is, invisible) in the latter language.

This line of argument would be reasonable if a complex innate language faculty were an established fact, but it isn't, and since Hauser, Chomsky & Fitch (2002), it seems to have been rejected within Minimalism. It follows that ideas about an innate language faculty should not be used to guide research on individual languages. Rather, as Müller (2015c: 25) puts it, "grammars should be motivated on a language-specific basis" (This view was already entertained by Boas (1911: 35, 43)). Does this mean that other languages are irrelevant when investigating a specific language? Clearly not. As Müller also says, "In situations where more than one analysis would be compatible with a given dataset for language X, the evidence from language Y with similar constructs is most welcome and can be used as evidence in favor of one of the two analyses for language X" (2015c: 43). In practice, any linguist working on a new language will use apparently similar phenomena in other languages as a starting point. It is important, however, to recognize that apparently similar phenomena may turn out upon careful investigation to be significantly different.<sup>10</sup>

# **3 Different views of grammar**

We turn now to more substantive differences between HPSG and Minimalism: differences in their conceptions of grammar, especially syntax, and differences in their views of syntactic structure. As we will see, these differences are related. In this section we consider the former, and in the next we will look at the latter.

# **3.1 Declarative and constraint-based vs. derivational and generative-enumerative approaches**

As is emphasized throughout this volume, HPSG assumes a declarative or constraint-based view of grammar. It also assumes that the grammar involves a complex systems of types and constraints. Finally, it assumes that syntactic analyses are complemented by separate semantic and morphological analyses. In each of

<sup>10</sup>Equally, of course, apparently rather different phenomena may turn out on careful investigation to be quite similar. For further discussion of HPSG and comparative syntax, see Borsley (2020).

#### 28 HPSG and Minimalism

these areas, Minimalism is different. It assumes a procedural view of grammar. It assumes that grammar involves just a few general operations. Finally, it assumes that semantics and morphology are simple reflections of syntax. We comment on each of these matters in the following subsections.

Whereas HPSG is a declarative or constraint-based approach, Minimalism seems to be firmly committed to a procedural approach. Chomsky (1995b: 219) remarks that: "We take L [a particular language] to be a generative procedure that constructs pairs (π, λ) that are interpreted at the articulatory-perceptual (A-P) and conceptual-intentional (C-I) interfaces, respectively, as 'instructions' to the performance systems". Various arguments have been presented within HPSG for a declarative view, but no argument seems to be offered within Minimalism for a procedural view. Obviously, speakers and hearers do construct representations and must have procedures that enable them to do so, but this is a matter of performance, and there is no reason to think that the knowledge that is used in performance has a procedural character (see Section 5.1 on processing). Rather, the fact that this knowledge is used in both production and comprehension suggests that it should be neutral between the two and hence declarative. See also Wasow (2021: Section 3.1), Chapter 24 of this volume on this point.

Another difference between constraint-based and generative-enumerative approaches is that the first type of proposal provides a way to get graded acceptability into the picture (Pullum & Scholz 2001: Section 3.1). Since HPSG grammars are basically feature-value pairs with equality (or other relations) between values, it is possible to weigh constraints, admit constraint violations, and work with structures with violated constraints (see for example Sorace & Keller 2005 on cumulative constraint violation). So looking at the sentences in (8), we see that more and more constraints are violated:

	- b. \* I are the chair of my department.
	- c. \* Me are the chair of my department.
	- d. \* Me are the chair of me's department.
	- e. \* Me are chair the of me's department.
	- f. \* Me are chair the me's department of.

(8b) violates constraints on subject verb agreement (Wechsler 2021: Section 2, Chapter 6 of this volume), (8b) additionally violates constraints on case assignment (Przepiórkowski 2021, Chapter 7 of this volume), (8c) additionally has a pronoun with the possessive marker instead of a possessive pronoun, (8d) addition-

ally violates the linearization constraint regarding determiners and nouns (Müller 2021b, Chapter 10 of this volume) and (8e) violates the order constraints on prepositions and the NPs depending on them. By assuming (differently) weighted constraints for agreement, case assignment, selection, and ordering, one can capture the difference in acceptability of the sequences in (8).

In comparison to this, a generative-enumerative grammar enumerates a set, and a sequence either is in the set or it is not.<sup>11</sup>

For further discussion of the issues, see Section 5.1 of this chapter and e.g., Pullum & Scholz (2001), Postal (2003), Sag & Wasow (2011: Section 10.4.2; 2015: 52), and Wasow (2021: Section 3.1), Chapter 24 of this volume.

# **3.2 Underspecification**

Another crucial difference between HPSG and Minimalism is that HPSG allows for the underspecification of information. In the absence of constraints, all principle options are possible. This is different in Minimalism. All structures that are derivable are predetermined by the numeration (one of the various sets of items preselected from the lexicon for an analysis; see also fn. 40). Features have to be specified, and they determine movement and properties of the derived objects. The general characterization of the frameworks is:

	- b. HPSG: Everything that is not ruled out works.

Let us consider some examples. The availability of type hierarchies makes it possible to underspecify part of speech information. For example, Sag (1997: 457) assumes that complementizer (*comp*) and verb (*verb*) have a common supertype *verbal*. A head can then select for a complement with the category *verbal*. So rather than specifying two lexical items with different valence information or equivalently one with a disjunctive specification *verb* ∨ *comp*, one has just one lexical item selecting for *verbal*. Similarly, schemata (grammar rules) can contain underspecified types. A daughter in a dominance schema can have a value of a certain type that subsumes a number of other types. Let's say three. Without this underspecification one would need three schemata: one for every subtype of the more general type.

Quantifier scope can be underspecified as well (Copestake, Flickinger, Pollard & Sag 2005; Richter & Sailer 1999; Koenig & Richter 2021, Chapter 22 of this volume): constraints regarding which quantifier outscopes which other quantifier may be left unspecified. The absence of the respective constraints results in a

<sup>11</sup>For a discussion of Chomsky's (1964; 1975: Chapter 5) proposals to deal with different degrees of acceptability, see Pullum & Scholz (2001: 29).

#### 28 HPSG and Minimalism

situation where several scopings are possible. In transformational models, it is usually assumed that quantifier elements move into certain positions covertly and scope relations are read off of the resulting tree (May 1985; Frey 1993; Sauerland & Elbourne 2002). This is unnecessary in HPSG. (See p.1295 for wrong predictions from movement-based approaches to quantifier scope).

# **3.3 Types and constraints vs. general operations**

The declarative-procedural contrast is an important one, but the contrast between the complex systems of types and constraints that are assumed within HPSG and the few general operations that form a Minimalist grammar is arguably more important.<sup>12</sup> Much work in Minimalism has three main operations: Merge, Agree, and Move or Internal Merge. Merge combines two expressions, either words or phrases, to form a larger expression with the same label as one of the expressions (Chomsky 1995b: 244; 2008: 140). Its operation can be presented as shown in Figure 2. In the case of English, the first alternative is represented by

$$\begin{array}{ccccc} & \mathbf{x} & & \mathbf{y} \\ \mathbf{x}, \mathbf{y} \Rightarrow & \bigwedge \mathbf{x} & \text{ or } & \bigwedge \mathbf{y} \\ \mathbf{x} & \mathbf{y} & & \mathbf{x} & \mathbf{y} \end{array}$$

Figure 2: Merge

situations where a lexical head combines with a complement, while the second is represented by situations where a specifier combines with a phrasal head. Chomsky (2008: 146) calls items merged with the first variant of Merge *first-merged* and those merged with the second variant *later-merged*.

Agree, as one might suppose, offers an approach to various kinds of agreement phenomena. It involves a probe, which is a feature or features of some kind on a head, and a goal, which the head c-commands. At least normally, the probe is a linguistic object with an uninterpretable feature or features with no value, and the goal has a matching interpretable feature or features with appropriate values (Chomsky 2001: 3–5).<sup>13</sup> Agree values the uninterpretable feature or features and they are ultimately deleted, commonly after they have triggered some morphological effect. Agree can be represented as in Figure 3 (where the "u" pre-

<sup>12</sup>A procedural approach doesn't necessarily involve a very simple grammatical system. The Standard Theory of Transformational Grammar (Chomsky 1965) is procedural but has many different rules, both phrase structure rules and transformations.

<sup>13</sup>Chomsky also assumes that the goal additionally has an uninterpretable feature of some kind to render it "active". In the case of subject-verb agreement, this is a Case feature on the subject.

fix identifies a feature as uninterpretable, and we have just one uninterpretable feature on the probe and just one matching interpretable feature on the goal). Unsurprisingly, subject-verb agreement is one manifestation of Agree, where X

Figure 3: Agree

is T(ense) and Y is a nominal phrase – for Minimalism a DP – inside the complement of T.<sup>14</sup> T presumably has two uninterpretable features, person and number, and the DP has two matching interpretable features. Here, and elsewhere, Agree is a non-local relation involving elements which are not sisters. This contrasts with the situation in HPSG, in which subject-verb agreement is a consequence of a relation between the subject and its VP sister and a relation between the VP and the V that heads it.<sup>15</sup>

Finally, Move, also called Internal Merge, is an operation which makes a copy of a constituent of some expression and merges it with that expression (Chomsky 1995b: Section 4.4; 2008: 140). The original element that is copied normally undergoes deletion. The process can be presented as in Figure 4.

This covers both the A<sup>0</sup> -movement process assumed for unbounded dependency constructions such as *wh*-interrogatives and the A-movement process assumed for raising sentences and passives. A question arises about so-called headmovement, where a head moves to a higher head position. This appears to mean that it must be possible for the copy to be merged with the head of the expression that contains it. However, this is incompatible with the widely assumed extension condition, which requires Merge to produce a larger structure. One response is the idea espoused in Chomsky (1995a: 368; 2001: 37) that head-movement takes place not in the syntax but in the Phonological Form (PF) component, which maps syntactic representations to phonetic representations. It seems that the status of head-movement is currently rather unclear.

<sup>14</sup>It is assumed within Minimalism that subjects originate inside the complement of T and that they are raised to the Specifier of T in English and many other languages.

<sup>15</sup>See also Wechsler (2021), Chapter 6 of this volume for a discussion of agreement in HPSG. Section 3 deals with locality issues.

28 HPSG and Minimalism

Figure 4: Move

The three operations just outlined interact with lexical items to provide syntactic analyses. It follows that the properties of constructions must largely derive from the lexical items that they contain. Hence, the properties of lexical items are absolutely central to Minimalism. Oddly, the obvious implication – that the lexicon should be a major focus of research – seems to be ignored. As Newmeyer (2005: 95, fn. 9) comments:

[…] in no framework ever proposed by Chomsky has the lexicon been as important as it is in the MP [Minimalist Program]. Yet in no framework proposed by Chomsky have the properties of the lexicon been as poorly investigated. (Newmeyer 2005: 95, fn. 9)

Sometimes it is difficult to derive the properties of constructions from the properties of visible lexical elements. But there is a simple solution: postulate an invisible element. The result is a large set of invisible functional heads. As we will see in Section 4.1.5 with respect to various patterns of relative clauses and in Section 4.1.6 with respect to Rizzi-style topic and focus phrases, these heads do the work in Minimalism that is done by phrase types and the constraints on them in HPSG.

Although Minimalism is a procedural approach and HPSG a declarative one, there are some similarities between Minimalism and early HPSG, the approach presented in Pollard & Sag (1987; 1994). In much the same way as Minimalism has just a few general mechanisms, early HPSG had just a few general phrase types. Research in HPSG in the 1990s led to the conclusion that this is too simple and that a more complex system of phrase types is needed to accommodate the full complexity of natural language syntax. Nothing like this happened within Minimalism, almost certainly because there was little attempt within this approach to deal with the full complexity of natural language syntax. As noted above, the approach has rarely been applied in detailed formal analyses. It looks too simple

and it appears problematic in various ways. It is also a major source of the complexity that is characteristic of Minimalist syntactic structures, as we will see in Section 4.

# **3.4 Labelling**

As we noted in the last section, Merge combines two expressions to form a larger expression with the same label as one of the original two. But which of the original expressions provides this label? This issue has been discussed, but not very satisfactorily. Chomsky defines which label is used in two different cases: the first case states that the label is the label of the head if the head is a lexical item, and the second case states that the label is the label of the category from which something is extracted (Chomsky 2008: 145). As Chomsky notes, these rules are not unproblematic, since the label is not uniquely determined in all cases. An example is the combination of two lexical elements, since in such cases, both elements can be the label of the resulting structure. Chomsky notices that this could result in deviant structures, but claims that this concern is unproblematic and ignores it. This means that rather fundamental notions in a grammar theory were ill-defined. A solution to this problem was provided five years later in his 2013 paper, but this paper is inconsistent (Müller 2016: Section 4.6.2). However, this inconsistency is not the point we want to focus on here. Rather, we want to show one more time that empirical standards are not met. Chomsky uses underdetermination in his labelling rules to account for two possible structures in (10), an approach going back to Donati (2006):

(10) what [ C [you wrote *t*]]

(10) can be an interrogative clause, as in *I wonder what you wrote*, or a free relative clause, as in *I will read what you wrote*. According to the labelling rule that accounts for sentences from which an item is extracted, the label will be CP, since the label is taken from the clause. However, since *what* is a lexical item, *what* can determine the label as well. If this labelling rule is applied, *what you wrote* is assigned DP as a label, and hence the clause can function as a DP argument of *read*.

Chomsky's proposal is interesting, but it does not extend to cases involving free relative clauses with complex *wh*-phrases (so-called pied-piping) as they are attested in examples like (11):

	- b. He gave me [what money] he had.

#### 28 HPSG and Minimalism

The example in (11a) is from one of the standard references on free relative clauses: Bresnan & Grimshaw (1978: 333), which appeared in the main MGG journal *Linguistic Inquiry* and is also cited in other mainstream generative work on free relative clauses, such as Groos & van Riemsdijk (1981) and van Riemsdijk (2006). (11b) is from Huddleston et al. (2002: 1068), a descriptive grammar of English.

Apart from the fact that complex *wh*-phrases are possible, there is even more challenging data in the area of free relative clauses: the examples in (12) and (13) show that there are non-matching free relative clauses:


(13) a. Worauf where.upon man one sich self mit with einer a Pro-form Pro-form beziehen refer kann, can […] ist is eine a Konstituente.<sup>17</sup> constituent

'If you can refer to something with a Pro-form, […] it is a constituent.'

b. [Aus out wem] who noch yet etwas something herausgequetscht out.squeezed werden be kann, can ist is sozial socially dazu there.to verpflichtet, obliged es it abzuliefern; to.deliver …<sup>18</sup> 'Those who have not yet been bled dry are socially compelled to hand over their last drop.'

In (12), a relative clause with a PP relative phrase functions as an accusative object. In (13), the relative clauses function as subjects. (13b) is another example of a relative clause with a complex *wh*-phrase. See Bausewein (1991) and Müller (1999a) for further discussion of free relative clauses and attested data.

According to Donati (2006: Section 5), pied-piping does not exist in free relatives (see also Citko 2008: 930–932 for a rejection of this claim). Given how much attention the issue of labelling has received and how central this is to Minimalist analyses, this situation is quite surprising: an empirically false claim made

<sup>16</sup>Bausewein (1991: 154).

<sup>17</sup>From the main text of: Günther Grewendorf. 1988. *Aspekte der deutschen Syntax: Eine Rektions"=Bindungs"=Analyse* (Studien zur deutschen Grammatik 33). Tübingen: Gunter Narr Verlag, p. 16, quoted from Müller (1999a: 61).

<sup>18</sup>Wiglaf Droste, taz, 01.08.97, p. 16, quoted from Müller (1999a: 61).

#### Robert D. Borsley & Stefan Müller

in 2002/2003 at two high profile conferences is the basis for foundational work from 2002 until 2013, even though the facts are common knowledge in the field. Ott (2011) develops an analysis in which the category of the relative phrase is projected, but he does not have a solution for nonmatching free relative clauses, as he admits in a footnote on page 187. The same is true for Citko's analysis (2008), in which the extracted XP can provide the label. So, even though the data has been known for decades, it is ignored by authors and reviewers, and foundational work is built on shaky empirical ground. See Müller (2016: Section 4.6.2) for a more detailed discussion of labelling.

# **3.5 Feature deletion and "crashing at the interfaces"**

In Section 3.3, we mentioned Case as an uninterpretable feature which renders a DP active. Like other uninterpretable features, this is deleted as a result of Agree because it is not interpretable in LF. This means that Minimalism claims that a case marked NP like *der Mann* 'the man' is not interpretable unless it is somehow stripped of its case information. So in Minimalism, *der Mann* needs something on top of the DP that Agrees with and thereby consumes the case feature. While this seems cumbersome to most working outside Minimalism, there are actually deeper problems connected to the deletion of case features. There are situations in which you need case features more than once. An example of this is free relative clauses as the one in (14b):

	- b. Ich I treffe, meet wen who.ACC ich I treffen meet will. want.to 'I meet whoever I like to meet.'

*wen* 'who' is the accusative object in the relative clause. Since it is an object, its case feature will be checked by the selecting verb *treffen* 'meet'. *wen* will then be a DP without any case information. However, the case of the relative phrase in free relative clauses is not arbitrary. It is important for the integration of the free relative clause in the matrix clause. The case of *wer* 'who' in a complete relative clause has to be known since it is important for the external distribution of the free relative clause, as the examples in (15) show:

(15) a. Wer who.NOM mich me treffen meet will, wants.to kann may vorbeikommen. over.come 'Whoever wants to meet me may come over.'

28 HPSG and Minimalism

b. \* Ich I treffe, meet wer who.NOM mich me treffen meet will. wants.to 'I meet whoever wants to meet me.'

HPSG also consumes resources in a way: items in valence representations are not projected up the tree once the requirement is saturated, but the difference is that objects with a certain structure and with certain features are not modified. A case-marked NP is not deprived of this case information. We think that this is the right way to deal with morphological markings and with feature specifications in general.

# **3.6 Some implications**

We will look in detail at the implications for syntactic structure of this machinery in the next section. However, we will note some implications in the following paragraphs as a kind of preview of the next section.

First, the fact that Merge combines two expressions entails that syntactic structures are confined to binary branching and excludes various analyses that have been assumed within HPSG and other frameworks. Second, the assumption that expressions produced by Merge have the same label as one of the expressions that they consist of (Chomsky 2008: 145) is essentially the assumption that all complex expressions are headed. For HPSG, as for many other approaches, there are headed expressions and non-headed expressions, e.g., coordination and the NPN Construction discussed in Sections 4.2.2 and 4.2.3, respectively.

As emphasized above, a further important feature of Minimalism is the view that semantics and morphology are simple reflections of syntax. The basic architecture assumed in Minimalism is shown in Figure 5. Both phonology and semantics are read off the structures produced by syntax. The idea that semantics is a simple reflection of syntax goes back to the early years of Transformational Grammar. One aspect of this idea was formalized as the Uniformity of Theta Assignment Hypothesis (UTAH) by Baker (1988: 46).

(16) Uniformity of Theta Assignment Hypothesis Identical thematic relationships between items are represented by identical structural relationships between those items at the level of D-structure.

Minimalism abandoned the notion of D-structure, but within Minimalism the Hypothesis can be reformulated as follows:

Figure 5: Syntax-centric architecture in Minimalism before the Phase model (left) and in the Phase model (right) according to Richards (2015: 812, 830)

(17) Uniformity of Theta Assignment Hypothesis (revised) Identical thematic relationships between items are represented by identical structural relationships between those items when introduced into the structure.

We will look at some of the implications of this in the next section.

The idea that morphology is a simple reflection of syntax is also important. As we will discuss in the next section, it leads to abstract underlying structures and complex derivations and to functional heads corresponding to various suffixes. Again, we will say more about this in the next section.

# **4 Different views of syntactic structure**

The very different views of grammar that are assumed in Minimalism and HPSG naturally lead to very different views of syntactic structure. The syntactic structures of Minimalism are both very complex and very simple. This sounds para-

#### 28 HPSG and Minimalism

doxical, but it isn't. They are very complex in that they involve much more structure than is assumed in HPSG and other approaches. But they are very simple in that they have just a single ingredient – they consist entirely of local trees in which there is a head responsible for the label of the local tree and a single nonhead. From the standpoint of HPSG, they are both too complex and too simple. We will consider the complexity in Section 4.1 and then turn to the simplicity in Section 4.2.

# **4.1 The complexity of Minimalist structures**

For HPSG, as the chapters in this volume illustrate, linguistic expressions have a single relatively simple constituent structure with a minimum of phonologically empty elements.<sup>19</sup> For Minimalism, they have a complex structure containing a variety of empty elements and with various constituents occupying more than one position in the course of the derivation. Thus the structures assumed within Minimalism are not at all minimalist. But this complexity is a more or less inevitable consequence of the Minimalist view of grammar outlined above.

### **4.1.1 Uniformity of structures due to semantic representation**

There are a variety of sources of complexity, and some predate Minimalism.<sup>20</sup> This is true especially of the idea that semantics and morphology are simple reflections of syntax (on morphology see Section 4.1.3). For the syntax-semantics relation, UTAH, which we introduced on p.1271, is particularly important. It leads to a variety of abstract representations and movement processes. Consider, for example, the following:

	- b. Lee saw who

*Who* bears the same thematic relation to the verb *see* in (18a) as in (18b). Assuming UTAH, it follows that *who* in (18a) should be introduced in the object position which it occupies in (18b) and then be moved to its superficial position. Consider next the following:

<sup>19</sup>The relatively simple structures of HPSG are not an automatic consequence of its declarative nature. Postal's Metagraph Grammar framework (formerly known Arc Pair Grammar) is a declarative framework with structures that are similar in complexity to those of Minimalism (see Postal 2010).

<sup>20</sup>For interesting discussion of the historical development of the ideas that characterize Minimalism, see Culicover & Jackendoff (2005: Chapters 2 and 3).

Robert D. Borsley & Stefan Müller

	- b. Kim saw Lee.

Here, *Lee* bears the same thematic relation to the verb *see* in (19a) as in (19b). Hence, it follows that *Lee* in (19a) should be introduced in the object position which it occupies in (19b) and then be moved to its superficial subject position. Finally, consider these examples:

	- b. It seems that Lee is ill.

Here, *Lee* bears the same thematic relation to *ill* in (20a) as in (20b). Thus, it follows that *Lee* in (20a) should be introduced in the same position as *Lee* in (20b). The standard Minimalist approach assumes that *Lee* in both examples originates in a position adjacent to *ill* and is moved a short distance in (20b) but a longer distance in (20a).

These analyses are more or less inevitable if one accepts UTAH. But how sound is UTAH? Work in HPSG shows that it is quite possible to capture both the syntactic and the semantic properties of these sentence types without the assumption that the crucial constituents occupy more than one position. Thus, there is no reason to accept UTAH.

### **4.1.2 Lexical decomposition à la Generative Semantics**

The idea that semantics is a simple reflection of syntax has led to other kinds of complexity. For example, it has led to revival of the idea once characteristic of Generative Semantics that lexical items may derive from complex expressions which in some sense represent their meanings.<sup>21</sup> Thus, Hale & Keyser (1993) argue that (21a) derives from a structure like that of (21b).

	- b. Kim put the books on the shelf.

One problem with this proposal is that *shelve X* means more than just *put X on the shelf*. Thus, (22a) is not equivalent to (22b).

	- b. Kim shelved his elbow.

<sup>21</sup>For typical Generative Semantics proposals of this kind, see McCawley (1968) and Postal (1970). Like Minimalism, Generative Semantics was characterized by extremely complex syntactic structures and for similar reasons. See Newmeyer (1986: Chapter 4) for discussion.

#### 28 HPSG and Minimalism

Moreover, as Culicover & Jackendoff (2005: 54–55) point out and as Hale & Keyser (1993: 105, fn. 7) note themselves, denominal verbs can have many different interpretations.<sup>22</sup>

	- b. He microwaved the food. (He put the food in the microwave and in addition he heated it.)
	- c. Lee chaired the meeting. (Lee was the chairperson of the meeting.)
	- d. Sandy skinned the rabbit. (Sandy removed the skin from the rabbit.)
	- e. Kim pictured the scene. (Kim constructed a mental picture of the scene.)
	- f. They stoned the criminal. (They threw stones at the criminal.)
	- g. He fathered three children. (He was the biological father of three children.)
	- h. He mothers his students. (He treats his students the way a mother would.)

Denominal verbs need to be associated with the correct meanings, but there is no reason to think that syntax has a role in this.<sup>23</sup>

# **4.1.3 Complex structures and morphology**

The idea that morphology is a simple reflection of syntax also leads to syntactic complexity. The fact that verbs in English and many other languages are marked for tense is one reason for the assumption that there is a T(ense) head at the heart of clause structure. Thus the sentence in (24) has the analysis in Figure 6.

(24) The cat chased the dog.

The verbal stem moves to the T head to pick up the -*ed* suffix.

Similarly, the fact that nouns in English and other languages are marked for number leads to the assumption that there is a Num(ber) head at the heart of noun

<sup>22</sup>The examples in (23c), (23g), and (23h) are taken from (Culicover & Jackendoff 2005: 54–55) or are parallel to examples they discussed.

<sup>23</sup>See Culicover & Jackendoff (2005: 53–56) for further discussion. For more recent Minimalist work assuming lexical decomposition, see, e.g., Harley (2012).

Figure 6: TP/VP analysis of simple English sentences

phrase structure. These elements are not solely motivated by morphology. The assumption that verbs move to T and nouns to Num in some languages but not others provides a way of accounting for cross-linguistic word order differences (Pollock 1989).<sup>24</sup> However, assumptions about morphology are an important part of the motivation. As discussed in Crysmann (2021), Chapter 21 of this volume, HPSG assumes a realizational approach to morphology, in which affixes are just bits of phonology realizing various properties of inflected words or derived lexemes. Hence, analyses like these are out of the question.

# **4.1.4 Binary branching**

Another source of complexity, which also predates Minimalism, is the assumption that all structures are binary branching. As Culicover & Jackendoff (2005: 112–116) note, this idea goes back to the 1980s. It entails that there can be no structures of the form in Figure 7a. Rather all structure must take the form in Figure 7b or Figure 7c. As Culicover & Jackendoff discuss, the arguments for the binary branching restriction have never been very persuasive. Moreover, it is incompatible with various analyses which have been widely accepted in HPSG and other frameworks. We will return to this topic in Section 4.2.

### **4.1.5 Unbounded dependency constructions**

As noted in Section 3, the simplicity of the Minimalist grammatical system means the properties of constructions must largely derive from the lexical items that

<sup>24</sup>See Kim (2021), Chapter 18 of this volume for a discussion of Pollock's proposal.

28 HPSG and Minimalism

Figure 7: Flat and binary branching

they contain. Hence, the properties of lexical items are absolutely central to Minimalism, and often this means the properties of phonologically empty items, especially empty functional heads. Thus, such elements are a central feature of Minimalist syntactic structures. These elements do much the same work as phrase types and the associated constraints in HPSG.

The contrast between the two frameworks can be illustrated with unbounded dependency constructions. Detailed HPSG analyses of various unbounded dependency constructions are set out in Sag (1997; 2010) and Ginzburg & Sag (2000), involving a complex system of phrase types (see also Borsley & Crysmann 2021, Chapter 13 of this volume). For Minimalism, unbounded dependency constructions are headed by a phonologically empty complementizer (C) and have either an overt filler constituent or an invisible filler (an empty operator) in their specifier position. Essentially, then, they have the structure in Figure 8. All the properties of the construction must stem from the properties of the C that heads it.

Figure 8: CP structures in Minimalism

An important unbounded dependency construction is relative clauses. In English there are *wh*-relatives, non-*wh*-relatives, and finite and non-finite relatives. *Wh*-relatives are illustrated by the following:

#### Robert D. Borsley & Stefan Müller

	- b. someone [on whom you can rely]
	- b. someone [on whom to rely]

These show that whereas finite *wh*-relatives allow either an NP or a PP as the filler, non-finite *wh*-relatives only allow a PP. In the HPSG analysis of Sag (1997), the facts are a consequence of constraints on two phrase types. A constraint on the type *fin-wh-fill-rel-cl* allows the first daughter to be an NP or a PP, while a constraint on *inf-wh-fill-rel-cl* requires the first daughter to be a PP. For Minimalism, the facts must be attributed to the properties of the complementizer. There must be a complementizer which takes a finite TP complement and allows either an NP or a PP as its specifier and another complementizer which takes a nonfinite TP complement (with an unexpressed subject) and only allows a PP as its specifier.

Non-*wh*-relatives require further phrase types within HPSG and further complementizers in Minimalism. However, rather than consider this, we will look at another unbounded dependency construction: *wh*-interrogatives. The basic data that needs to be accounted for is illustrated by the following:

	- b. I wonder [who knows].
	- c. Who did Kim talk to?
	- d. I wonder [who Kim talked to].
	- e. I wonder [who to talk to].

Like *wh*-relatives, *wh*-interrogatives can be finite and non-finite. When they are finite, their form depends on whether the *wh*-phrase is subject of the highest verb or something else. When it is subject of the highest verb, it is followed by what looks like a VP, although it may be a clause with a gap in subject position. When the *wh*-phrase is something else, the following clause shows auxiliaryinitial order if it is a main clause and subject-initial order if it is not. Non-finite *wh*-interrogatives are a simple matter, especially as the filler does not have to be restricted in the way that it does in non-finite *wh*-relatives. Ginzburg & Sag (2000: Section 6.5.2, 6.5.3) present an analysis which has two types for finite *wh*-interrogatives, one for subject-*wh*-interrogatives such as those in (27a) and (27b), and another for non-subject-*wh*-interrogatives such as those in (27c) and (27d). The latter is subject to a constraint requiring it to have the same value

#### 28 HPSG and Minimalism

for the features IC (INDEPENDENT-CLAUSE) and INV (INVERTED). Main clauses are [IC +] and auxiliary-initial clauses are [INV +]. Hence the constraint ensures that a non-subject-*wh*-interrogative shows auxiliary-initial order when it is a main clause.

How can the facts be handled within Minimalism? As noted above, Minimalism analyses auxiliary-initial order as a result of movement of the auxiliary to C. It is triggered by some feature of C. Thus C must have this feature when (a) it heads a main clause and (b) the *wh*-phrase in its specifier position is not the subject of the highest verb. There are no doubt various ways in which this might be achieved, but the key point is the properties of a phonologically empty complementizer are crucial.

Borsley (2006b; 2017) discusses Minimalist analyses of relative clauses and *wh*interrogatives and suggests that at least eight complementizers are necessary. One is optionally realized as *that*, and another is obligatorily realized as *for*. The other six are always phonologically empty. But it has been clear since Ross (1967) and Chomsky (1977) that relative clauses and *wh*-interrogatives are not the only unbounded dependency constructions. Here are some others:


Each of these constructions will require at least one empty complementizer. Thus, a comprehensive account of unbounded dependency constructions will require a large number of such elements. But with a large unstructured set of complementizers there can be no distinction between properties shared by some or all elements and properties restricted to a single element. There are a variety of shared properties. Many of the complementizers will take a finite complement, many others will take a non-finite complement, and some will take both. There will also be complementizers which take the same set of specifiers. Most will not attract an auxiliary, but some will, not only the complementizer in an example like (27c) but also the complementizers in the following, where the auxiliary is in italics:

	- b. Kim is in Colchester, and so *is* Lee.

#### Robert D. Borsley & Stefan Müller


Thus, there are generalizations to be captured here. The obvious way to capture them is with the approach developed in the 1980s in HPSG work on the hierarchical lexicon (Flickinger, Pollard & Wasow 1985; Flickinger 1987), i.e., a detailed classification of complementizers which allows properties to be associated not just with individual complementizers but also with classes of complementizers. With this, it should be possible for Minimalism not just to get the facts right but also to capture the full set of generalizations. In many ways such an analysis would be mimicking the HPSG approach with its hierarchy of phrase types.<sup>25</sup> But in the present context, the main point is that the simplicity of the Minimalist grammatical system is another factor which leads to more complex syntactic structures than those of HPSG.

### **4.1.6 Syntactification of semantic categories**

The left periphery of the clause is often much more complex than assumed in the last section as a result of the syntactification of semantic properties (Rizzi 2014), which is one aspect of the idea that semantics is a simple reflection of syntax. This is especially apparent in a sub-school that calls itself "cartographic". MGG comes with strong claims about the autonomy of syntax. There is a syntactic component and then there are the components of Phonological Form (PF) and Logical Form (LF); in more recent versions of the theory this is the articulatory-perceptual system (AP) and the conceptual-intentional system (CI). Figure 5 shows the early Minimalist architecture and the architecture assumed in the Phase-based models. Syntax was always regarded as primary, and PF and LF as derived from syntactic representations. This is similar in Minimalism. The problem is that questions of intonation are connected to semantic and information-structural properties (Halliday 1970: 36). A way around this is to stipulate syntactic features that can be interpreted by both PF and LF (Gussenhoven 1983). Another way of dealing with the data is to employ empty elements that are responsible for a certain ordering of elements and that can be interpreted in the semantics. The accounts of Rizzi and Cinque are very prominent in this school of thought. For example, Rizzi (1997) suggests an analysis of the left periphery of clauses that incorporates special functional projections for topic and focus. His analysis is shown in Figure 9. In comparison, no such projections exist in HPSG theories. HPSG grammars are surface-oriented and the syntactic labels correspond for the most part to classical

<sup>25</sup>For a fuller discussion of these issues, see Borsley (2006b; 2017).

28 HPSG and Minimalism

Figure 9: Syntactic structure of sentences following Rizzi (1997: 297)

#### Robert D. Borsley & Stefan Müller

part of speech categorizations. So in examples with frontings like (30), the whole linguistic object is a verbal projection and not a Topic phrase, a Focus Phrase, or a Force phrase.

### (30) Bagels, I like.

Of course the fronted elements may be topics or foci, but this is a property that is represented independently of syntactic information in parts of feature descriptions having to do with information structure. For treatment of information structure in HPSG, see Engdahl & Vallduví (1996), De Kuthy (2002), Song (2017) and also De Kuthy (2021), Chapter 23 of this volume. On determination of clause types, see Ginzburg & Sag (2000) and Müller (2015b). For general discussion of the representation of information usually assigned to different linguistic "modules" and on "interfaces" between them in theories like LFG and HPSG, see Kuhn (2007).

Cartographic approaches also assume a hierarchy of functional projections for the placement of adverbials. Some authors assume that all sentences in all languages have the same structure, which is supposed to explain orders of adverbials that seem to hold universally (e.g., Cinque 1999: 106 and Cinque & Rizzi 2010: 54–55). A functional head selects for another functional projection to establish this hierarchy of functional projections, and the respective adverbial phrases can be placed in the specifier of the corresponding functional projection. Cinque (1999: 106) assumes 32 functional projections in the verbal domain. Cinque & Rizzi (2010: 57, 65) assume at least four hundred functional heads, which are – according to them – all part of a genetically determined UG.

In comparison, HPSG analyses assume that verbs project both in head-argument and head-adjunct structures: a verb that is combined with an argument is a verbal projection. If an adverb attaches, a verbal projection with the same valence but augmented semantics results. Figure 10 shows the Cartographic and the HPSG structures. While the adverbs (Adv<sup>1</sup> and Adv<sup>2</sup> in the figure) attach to verbal projections in the HPSG analysis (S and VP are abbreviations standing for verbal projections with different valence requirements), the Cartographic approach assumes empty heads that select a clausal projection and provide a specifier position in which the adverbs can be realized. For the sake of exposition we called these heads FAdv<sup>1</sup> and FAdv2. For example, FAdv<sup>2</sup> can combine with the VP and licenses an Adv<sup>2</sup> in its specifier position. As is clear from the figure, the Cartographic approach is more complex since it involves two additional categories (FAdv<sup>1</sup> and FAdv2) and eight nodes for the adverbial combination rather than four.

An interesting difference is that verbal properties are projected in the HPSG analysis. By doing this it is clear whether a VP contains an infinitive or a par28 HPSG and Minimalism

Figure 10: Treatment of adverbial phrases in Cartographic approaches and in HPSG

ticiple. This property is important for the selection by a superordinate head, e.g., the auxiliary in the examples in (31).

	- b. Kim will meet Sandy.

In a Cartographic approach, one has to assume either that adverbial projections have features correlated with verbal morphology or that superordinate heads may check properties of linguistic items that are deeply embedded.

If one believed in Universal Grammar (which researchers working in HPSG usually do not, see also Ball's (2021) chapter on understudied languages and universals) and in innately specified constraints on adverb order, one would not assume that all languages contain the same structures, some of which are invisible. Rather, one would assume linearization constraints (see Müller 2021b: Section 2, Chapter 10 of this volume) to hold crosslinguistically.<sup>26</sup> If adverbs of a certain type do not exist in a language, the linearization constraints would not do any

<sup>26</sup>Adjuncts are usually not siblings in local structures in HPSG (but see Kasper 1994 and Bouma & van Noord 1998: 62, 71). There are nevertheless ways to impose order constraints on nonsiblings. Engelkamp, Erbach & Uszkoreit (1992) discuss one approach; another approach would be to have Reape-style order domains (Reape 1994) in addition to the immediate dominance schemata for head-adjunct combination. See Müller (2021b: Section 6), Chapter 10 of this volume for more on order domains.

#### Robert D. Borsley & Stefan Müller

harm. They just would never apply, since there is nothing to apply to (Müller 2015c: 46).

For actual HPSG analyses dealing with adverb order, see Koenig & Muansuwan (2005) and Abeillé & Godard (2004). The work of Koenig & Muansuwan (2005) is particularly interesting here since the authors provide an analysis of the intricate Thai aspect system and explicitly compare their analysis to Cinque-style analyses.

# **4.1.7 Summary**

Having discussed uniformity in theta role assignment, Generative Semantics-like approaches, branching, nonlocal dependencies, and Cartographic approaches to the left periphery and adverb order within clauses, we conclude that a variety of features of Minimalism lead to structures that are much more complex than those of HPSG. HPSG shows that this complexity is unnecessary given a somewhat richer conception of grammar.

# **4.2 The simplicity of Minimalist structures**

As we emphasized above, while Minimalist structures are very complex, they are also simple in the sense that they have just a single ingredient, local trees consisting of a head and a single non-head. Most outsiders agree that this is too simple.

# **4.2.1 Binary branching, VPs, and verb-initial clauses**

We look first at binary branching.<sup>27</sup> As we noted above, the assumption that all branching is binary is incompatible with various analyses which have been widely accepted in HPSG and other frameworks. For example, it means that the bracketed VP in (32), which contains two complements, cannot have the ternary

<sup>27</sup>In addition to structures with two or more branches, HPSG uses unary branching structures both in syntax and in the lexicon (lexical rules basically are unary branching structures); see Davis & Koenig (2021: Section 5), Chapter 4 of this volume. For example, unary branching syntactic rules are used for semantic type shifting (Partee 1986). For respective HPSG analyses see Flickinger (2008: 91–92), Gerbl (2007: 241–242) and Müller (2009: 225). The lack of unary branching structures in Minimalism is no problem since empty heads can be used instead. The empty head projects the properties that would be otherwise assigned to the mother node of the unary projection. See for example Ramchand (2005: 370). So, while the effects of unary projections can be modelled, the resulting structures are more complex. For a general discussion of empty elements, unary projections, and lexical rules, see Müller (2016: Sections 19.2 and 19.5).

#### 28 HPSG and Minimalism

branching structure in Figure 11, which is suggested in Pollard & Sag (1994: 36) and much other work.

(32) Kim [gave a book to Lee].

Figure 11: Flat structure for the VP *gave Lee a book*

Instead, it has been assumed since Larson (1988) that the VP in examples like (32) has something like the structure in Figure 12. It is assumed that the verb

Figure 12: Larson-type analysis of VPs involving little *v*

originates in the lower VP and is moved into the higher VP. The higher V position to which the verb moves is commonly labelled *v* ("little *v*") and the higher phrase *v*P. The main argument for such an analysis appears to involve anaphora, especially contrasts like the following:

	- b. \* John showed herself Mary in the picture.

The first complement can be the antecedent of a reflexive which is the second complement, but the reverse is not possible.

If constraints on anaphora refer to constituent structure as suggested by Chomsky (1981), the contrast suggests that the second NP should be lower in the structure than the first NP. But, as suggested by Pollard & Sag (1992), it is assumed in HPSG that constraints on anaphora refer not to constituent structure but to a list containing all arguments in order of obliqueness, in recent versions of HPSG the ARG-ST list (see also Müller 2021a, Chapter 20 of this volume). On this view, anaphora can provide no argument for the complex structure in Figure 12. Therefore, both flat structures and binary branching structures with different branching directions as in Figure 13 are a viable option in HPSG. Müller

Figure 13: Possible analysis of VPs in HPSG with a branching direction differing from Larson-type structures

(2015a: Section 2.4; 2021d) argues for such binary branching structures as a result of parametrising the Head-Complement Schema for various variants of constituent order (head-initial and head-final languages with fixed constituent order and languages like German and Japanese with freer constituent order).

The fact that Merge combines two expressions also means that the auxiliaryinitial clause in (34) cannot have a flat structure with both subjects and complement(s) as sisters of the verb, as in Figure 14.

### (34) Will Kim be here?

It is standardly assumed in Minimalism that the auxiliary-initial clause has a structure of the form in Figure 15 or more complicated structures, as explained in Section 4.1.6. *Will* is analysed as a T(ense) element which moves to the C(omplementizer) position. A binary branching analysis of some kind is the only possibility within Minimalism, provided the usual assumptions are made.

28 HPSG and Minimalism

Figure 14: Flat structure for *Will Kim be here?*

Figure 15: CP/TP structure for *Will Kim be here?*

It is not just English auxiliary-initial clauses that cannot have a ternary branching analysis within Minimalism but verb-initial clauses in any language. A notable example is Welsh, which has verb-initial order in all types of finite clause. Here are some relevant examples:<sup>28</sup>


<sup>28</sup>Positive main clause verbs are optionally preceded by a particle (*mi* or *fe*). We have included this in (35a) but not in (35b). When it appears, it triggers so-called soft mutation. Hence (35a) has *gerddith* rather than the basic form *cerddith*, which is seen in (35b).

#### Robert D. Borsley & Stefan Müller

A variety of transformational work, including work in Minimalism, has argued for an analysis like Figure 15 for Welsh finite clauses (see, e.g., Jones & Thomas 1977, Sproat 1985, Sadler 1988, Rouveret 1994, and Roberts 2005). But Borsley (2006a) argues that there is no theory-neutral evidence for a structure of this kind. Hence, at least for Welsh, it seems that a simpler flat structure like Figure 14 is preferable.<sup>29</sup> Note that we do not argue that structures like the one in Figure 15 are not appropriate for any language. The analogue to head-movement analyses is standard among HPSG grammarians of German and there is data from apparent multiple frontings that makes an analysis which is the HPSG analogue of head-movement unavoidable (Müller 2003a; 2005b). See Müller (2021c) for a book-length discussion of German clause structure. Müller (2021b: Section 5.1), Chapter 10 of this volume also discusses head-movement in HPSG.

### **4.2.2 Headedness and coordination**

We turn now to the idea that all structures are headed. For HPSG, and many other approaches, there are headed structures and non-headed structures. Probably the most important example of the latter are coordinate structures such as those in (36) (see Sag 2003 and Abeillé & Chaves 2021, Chapter 16 of this volume for HPSG analyses. Section 2 of the latter work explicitly deals with headedness.).

(36) [Kim and Lee] [write poems and paint pictures].

Much work in Minimalism assumes that coordinate structures are headed by the conjunction (Larson 1990: 596; Radford 1993: 89; Kayne 1994: Chapter 6; Johannessen 1998: 109; Van Koppen 2005: 8; Bošković 2009: 474; Citko 2011: 27).<sup>30</sup> This suggests that both coordinate structures in (36) are conjunction phrases. This in turn suggests that it should be possible for the two coordinate structures in (36) to replace each other, giving (37).

(37) [Write poems and paint pictures] [Kim and Lee]

Obviously, this is not possible.<sup>31</sup> It is fairly clear that conjunctions cannot be ordinary heads. Johannessen (1996: 669) suggests an analysis in which a coordinate

<sup>29</sup>Borsley (2016) argues for a similar flat structure for the Caucasian ergative SOV language Archi.

<sup>30</sup>Kayne (1994: 57) differs from other proposals in not assuming the category for the conjunction. Instead, he uses X<sup>0</sup> as the category in his structured examples. Since X is an underspecified variable, his theory is underdetermined: while a ConjP is not compatible with any requirement by a governing head, an XP could appear as an argument of any dominating head. Kayne needs to work out a theory that determines the properties of the projected XP in relation to the coordinated items. We discuss this below.

<sup>31</sup>For a more detailed critique of the ConjP approach, see Borsley (2005).

#### 28 HPSG and Minimalism

structure has the features of the first conjunct. She depicts the analysis as in Figure 16. The problem is that it is unclear how this should be formalized: either the

Figure 16: Analysis of coordination with projection of features from the first conjunct according to Johannessen (1996: 669)

head category of the complete object is ConjP or it is X. Governing heads have to know where to look for the category. If they look at X, why is the part of speech information of Co projected? Why would governing heads not look at the category of other specifiers rather than their heads? Furthermore, coordinations are not equivalent to the first conjunct. There are cases where the coordination is a sum of the parts. For example, *Kim and Sandy* is a plural NP, as the agreement with the verb shows:

(38) Kim and Sandy laugh.

Johannessen's analysis seems to predict that the coordination of *Kim* and *Sandy* behaves like *Kim*, which is not the case. So, if one wants to assume an analysis with the conjunction as a head, one would have to assume that the head is a functor taking into account the properties of its specifier and complement, and projecting nominal information if they are nominal, verbal if they are verbal, etc. (Steedman 1991). This would make them a unique type of a head with a unique relation to their specifier and complement. A problem for this approach is coordinate structures in which the conjuncts belong to different categories, e.g., the following:

	- b. Hobbs is [angry and in pain].

Such examples have led to HPSG analyses in which coordinate structures have whatever properties are common to the two conjuncts (Sag 2003). Within Minimalism, one might try to mimic such analyses by proposing that conjunctions have whatever properties are common to their specifier and complement. But a

#### Robert D. Borsley & Stefan Müller

problem arises with an example like (40), where the conjuncts are not phrases but words.

(40) Kim [criticized and insulted] his boss.

To accommodate such examples, conjunctions would have to acquire not only part of speech information from the conjuncts but also selectional information. They would be heads which combine with a specifier and a complement to form an expression which, like a typical head, combines with a specifier and a complement. This would be a very strange situation and in fact it would make wrong predictions, since the object *his boss* would be the third-merged item. It would hence be "later-merged" in the sense of Chomsky (2008: 146) and therefore treated as a specifier rather than a complement.<sup>32</sup>

# **4.2.3 Binary branching and headless structures: The NPN Construction**

Another problem for Minimalist theories is the NPN Construction discussed by Matsuyama (2004) and Jackendoff (2008). Examples are provided in (41):

	- b. Day after day after day went by, but I never found the courage to talk to her. (Bargmann 2015)

As Jackendoff argued, it is not possible to identify one of the elements in the construction as the head. The construction has several peculiar properties and we share Jackendoff's view that these constructions are best treated by a phrasal configuration in which these highly idiosyncratic properties are handled. The construction is discussed in more detail in Müller (2021e), Chapter 32 of this volume, and Bargmann's analysis within HPSG is provided. Bargmann's analysis also captures multiple repetitions of the PN sequence, as in (41b). Up until now there have been few proposals for NPN in the Minimalist framework: Travis (2003) and G. Müller (2011: Section 3). G. Müller develops a post-syntactic reduplication account, which he assumes to be purely phonological (p. 235). He states that reduplication applies to words only and claims that German differs from English in not allowing adjective noun sequences in NPN Constructions. He is aware of the possibility of these constructions in English (*miserable day after miserable day*) and states that his analysis is intended to account for the German data only. While this on its own is already a serious shortcoming of the analysis, the empirical claim does not hold water either, as the following example from Müller (2021e: 1533), Chapter 32 of this volume shows:

<sup>32</sup>There have been attempts to argue that conjuncts are always phrases (Kayne 1994, Bruening 2018). But this position seems untenable (Abeillé 2006, Müller 2018: Section 7).

#### 28 HPSG and Minimalism

(42) Die the beiden two tauchten surfaced nämlich namely geradewegs straightaway wieder again aus from dem the heimischen home Legoland Legoland auf, PART wo where sie they im in.the Wohnzimmer, living.room schwarzen black Stein brick um after schwarzen black Stein, brick vermeintliche alleged Schusswaffen firearms nachgebaut recreated hatten.<sup>33</sup> had 'The two surfaced straightaway from their home Legoland where they had recreated alleged firearms black brick after black brick.'

Apart from failing on the reduplication of adjective-noun combinations like *schwarzen Stein* 'black brick', the reduplication approach also fails on NPN patterns with several PN repetitions as in (41b): if the preposition is responsible for reduplicating content, it is unclear how the first *after* is supposed to combine with *day* and *day after day*. It is probably possible to design analyses of the NPN Construction involving several empty heads, but it is clear that these solutions would come at a rather high price. Similar criticism applies to Travis' (2003) account. Travis suggests a syntactic approach to reduplication: there is a special Q head and some part of the complement that Q takes is moved to the specifier position of Q. This analysis begs several questions: why can incomplete constituents move to SpecQP? How is the external distribution of NPN Constructions accounted for? Are they QPs? Where can QPs appear? Why do some NPN Constructions behave like NPs? How is the meaning of this construction accounted for? If it is assigned to a special Q, the question is: how are examples like (41b) accounted for? Are two Q heads assumed? And if so, what is their semantic contribution?

# **4.2.4 Movement for more local phenomena like scrambling, passive, and raising**

We want now to consider the dependencies that Minimalism analyzes in terms of Move/Internal Merge. In the next section we look at unbounded dependencies, but first we consider local dependencies in passives, unaccusatives, raising sentences, and scrambling. The following illustrate the first three of these:

(43) a. Kim has been elected.


<sup>33</sup>Attested example from the newspaper taz, 05.09.2018, p. 20, quoted from Müller (2021e: 1533).

Robert D. Borsley & Stefan Müller

These differ from unbounded dependency constructions in that, whereas the gaps in the latter are positions in which overt NPs can appear, this is not true of the supposed gap positions in (43):

	- b. \* It has disappeared Kim.
	- c. \* It seems Kim to be clever.

This is a complication if they involve the same mechanism, but is unsurprising if they involve different mechanisms, as in HPSG and most other frameworks.

# 4.2.4.1 Passive

In the classical analysis of the passive in MGG, it is assumed that the morphology of the participle suppresses the agent role and removes the ability to assign accusative case. In order to receive case the underlying object has to move to the subject position, i.e., Spec TP, where it gets the nominative case (Chomsky 1981: 124).

	- b. [The girl] was given [a cookie] (by the mother).

The analysis assumed in recent Minimalist work differs in detail but is movementbased like its predecessors. While movement-based approaches seem to work well for SVO languages like English, they are problematic for SOV languages like German. To see why, consider the examples in (46), which are based on an observation by Lenerz (1977: Section 4.4.3):

	- b. weil because dem the.DAT Jungen boy der the.NOM Ball ball geschenkt given wurde was 'because the ball was given to the boy'
	- c. weil because der the.NOM Ball ball dem the.DAT Jungen boy geschenkt given wurde was

In comparison to (46c), (46b) is the unmarked order (Höhle 1982). *der Ball* 'the ball' in (46b) occurs in the same position as *den Ball* in (46a), that is, no movement is necessary. Only the case differs. (46c) is, however, somewhat marked in comparison to (46b). So, if one assumed (46c) to be the normal order for passives and (46b) is derived from this by movement of *dem Jungen* 'the boy', (46b) should

#### 28 HPSG and Minimalism

be more marked than (46c), contrary to the facts. To solve this problem, an analysis involving abstract movement has been proposed for cases such as (46b): the elements stay in their positions, but are connected to the subject position and receive their case information from there. Grewendorf (1995: 1311) assumes that there is an empty expletive pronoun in the subject position of sentences such as (46b) as well as in the subject position of sentences with an impersonal passive such as (47):<sup>34</sup>

(47) weil because heute today nicht not gearbeitet worked wird is 'because there will be no work done today'

A silent expletive pronoun is something that one cannot see or hear and that does not carry any meaning. Such entities are not learnable from input, and hence innate domain specific knowledge would be required and of course, approaches that do not have to assume very specific innate knowledge are preferable. For further discussion of language acquisition see Section 5.2.

HPSG does not have this problem, since the passive is treated by lexical rules that map verbal stems onto participle forms with a reduced argument structure list (Pollard & Sag 1987: 215; Müller 2003b; Müller & Ørsnes 2013; Davis, Koenig & Wechsler 2021: Section 5.3, Chapter 9 of this volume). The first element (the subject in the active voice) is suppressed so that the second element (if there is any) becomes the first. In SVO languages like English and Icelandic, this element is realized before the verb: there is a valence feature for subjects/specifiers, and items that are realized with the respective schema are serialized to the left of the verb. In SOV languages like German and Dutch, the subject is treated like other arguments, and hence it is not put in a designated position before the finite verb (Müller 2021b: Section 4, Chapter 10 of this volume). No movement is involved in this valence-based analysis of the passive. The problem of MGG analyses is that they mix two phenomena: passive and subject requirement. Since these two phenomena are kept separate in HPSG, problems like the one discussed above can be avoided. See Müller (2016: Section 3.4 and Chapter 20) for further discussion.

### 4.2.4.2 Scrambling

Discussing the passive, we already touched on problems related to local reordering of arguments, so-called *scrambling*. In what follows, we want to discuss scrambling in more detail. Languages like German have a freer constituent order

<sup>34</sup>See Koster (1986: 11–12) for a parallel analysis for Dutch as well as Lohnstein (2014: 180) for a movement-based account of the passive that also involves an empty expletive for the analysis of the impersonal passive.

than English. A sentence with a ditransitive verb allows for six permutations of the arguments, two of which are given in (48):

	- b. [weil] because das the.ACC Buch book der the.NOM Mann man der the.DAT Frau woman gibt gives

It has been long argued that scrambling should be handled as movement as well (Frey 1993). An argument that has often been used to support the movement-

Figure 17: The analysis of local reordering as movement to Spec TP and the "basegeneration" analysis assumed in HPSG

based analysis is the fact that scope ambiguities exist in sentences with reorderings which are not present in sentences in the base order. The explanation of such ambiguities comes from the assumption that the scope of quantifiers can be derived from their position in the superficial structure as well as their position in the underlying structure. If the position in both the surface and deep structure are the same, that is, when there has not been any movement, then there is only one reading possible. If movement has taken place, however, then there are two possible readings (Frey 1993: 185):

28 HPSG and Minimalism

(49) a. Es it ist is nicht not der the Fall, case daß that er he mindestens at.least einem one Verleger publisher fast almost jedes every Gedicht poem anbot. offered

> 'It is not the case that he offered at least one publisher almost every poem.'

b. Es it ist is nicht not der the Fall, case daß that er he fast almost jedes every Gedicht poem mindestens at.least einem one Verleger publisher \_ anbot. offered

'It is not the case that he offered almost every poem to at least one publisher.' or 'It is not the case that he offered at least one publisher almost every poem.'

(49a) is unambiguous with *at least one* scoping over *almost every* but (49b) has two readings: one in which *almost every* scopes over *at least one* (surface order) and one in which *at least one* scopes over *almost every* (reconstructed underlying order).

It turns out that approaches assuming traces run into problems, as they predict certain readings which do not exist for sentences with multiple traces (see Kiss 2001: 146 and Fanselow 2001: Section 2.6). For instance, in an example such as (50), it should be possible to interpret *mindestens einem Verleger* 'at least one publisher' at the position of \_ , which would lead to a reading where *fast jedes Gedicht* 'almost every poem' has scope over *mindestens einem Verleger* 'at least one publisher'. However, this reading does not exist.

(50) Ich I glaube, believe dass that mindestens at.least einem one Verleger publisher fast almost jedes every Gedicht poem nur only dieser Dichter \_ \_ angeboten hat.

this poet offered has

'I think that only this poet offered almost every poem to at least one publisher.'

The alternative to movement-based approaches are so-called "base-generation" approaches in which the respective orders are derived directly. Fanselow (2001), working within the Minimalist Program, suggests such an analysis in which arguments can be combined with their heads in any order. This is the HPSG analysis that was suggested by Gunji (1986: Section 4.1) for Japanese and is standardly

#### Robert D. Borsley & Stefan Müller

used in HPSG grammars of German (Hinrichs & Nakazawa 1989: 8; Kiss 1995: 221; Meurers 1999: 199; Müller 2005a: 7; 2021c). See also Müller (2021b: 379–380), Chapter 10 of this volume.

Sauerland & Elbourne (2002: 308) discuss analogous examples from Japanese, which they credit to Kazuko Yatsushiro. They develop an analysis where the first step is to move the accusative object in front of the subject. Then, the dative object is placed in front of that and then, in a third movement, the accusative is moved once more. The last movement can take place to construct either a structure that is later passed to LF or as a movement to construct the Phonological Form. In the latter case, this movement will not have any semantic effects. While this analysis can predict the correct available readings, it does require a number of additional movement operations with intermediate steps.

# **4.2.5 Nonlocal dependencies**

Having dealt with phenomena treated via Move/Internal Merge in Minimalism but involving more local phenomena, we now turn to genuine nonlocal dependencies and compare the Move/Internal Merge approach to the one in HPSG.

# 4.2.5.1 Gaps without filler

The Move/Internal Merge approach seems quite plausible for typical examples of an unbounded dependency, but issues arise with less typical examples. Within this approach, one expects to see a clause-initial filler constituent and a gap somewhere in the following clause. This is what we commonly find, but there are unbounded dependency constructions in which there is a gap but no visible higher constituent matching it. Consider, e.g., the following:

	- b. Lee is too important [for you to talk to \_].
	- c. Lee is important enough [for you to talk to \_].
	- d. Kim is easy [for anyone to talk to \_].

Within Minimalist assumptions, it is more or less necessary to assume that such examples contain an invisible filler (a so-called empty operator). Unless there is some independent evidence for such invisible fillers, they are little more than an ad hoc device to maintain the Move/Internal Merge approach. Within the HPSG SLASH-based approach to unbounded dependencies, there is no assumption that there should always be a filler at the top of an unbounded dependency (Pollard & Sag 1994: Chapter 4, see also Borsley & Crysmann 2021: 555–557, Chapter 13 of this volume). Hence, the examples in (51) are completely unproblematic.

28 HPSG and Minimalism

### 4.2.5.2 Filler without gaps: Resumptive pronouns

There are also unbounded dependency constructions which seem to have not a gap but a resumptive pronoun (RP). Among many languages that are relevant here is Welsh, which has RPs in both *wh*-interrogatives and relative clauses, as the following, in which the resumptive pronouns are italicized, illustrate:

	- b. y the dyn man werthodd sell.PST.3SG Ieuan Ieuan y the ceffyl horse iddo to.3SG.M *fo* he 'the man that Ieuan sold the horse to'

Willis (2011) and Borsley (2010; 2013) present evidence that Welsh RPs involve the same mechanism as gaps. Within Minimalism, this means that they must involve Move/Internal Merge.<sup>35</sup> But one expects to see a gap where Move/Internal Merge has applied. One Minimalist response suggests that instead of being deleted, the copy left behind by Move/Internal Merge is somehow turned into a pronoun (see McCloskey 2006: 110). A problem for this approach is that it makes it surprising that RPs universally look like ordinary pronouns (McCloskey 2002). Another approach exploits the complexity of Minimalist structures and proposes that there is a gap in the structure somewhere near the RP. Thus, for example, Willis (2011: 216) proposes that examples like those in (52) with an RP in prepositional object position have a coindexed operator in the specifier position of PP, which undergoes movement. Similar approaches are outlined in Aoun et al. (2001) and Boeckx (2003). For detailed objections to both approaches, see Borsley (2013: Section 3). Within the SLASH-based approach of HPSG, there is no reason to think that there will always be a gap at the bottom of a dependency, and it is not difficult to accommodate RPs. See Vaillette (2002), Taghvaipour (2010), Borsley (2013), and Crysmann (2012; 2016) for slightly different approaches.<sup>36</sup> See also

	- b. Which book did you criticize \_ without reading \_?

<sup>35</sup>Rouveret (2008)sketches a Minimalist analysis of Welsh RPs which does not involve movement. For criticisms of this analysis, see Borsley (2015: 13–14).

<sup>36</sup>Also relevant here are examples with more than one gap such as the following:

There have been various attempts to accommodate such examples within the Move/Internal Merge approach, but it is not clear that any of them is satisfactory. In contrast, such examples are expected within the SLASH-based approach (Levine & Sag 2003). See also Pollard & Sag (1994: Section 4.6).

Robert D. Borsley & Stefan Müller

Borsley & Crysmann (2021), Chapter 13 of this volume for a more detailed discussion of nonlocal dependencies and for further comparison between the HPSG and Minimalist approaches to unbounded dependencies, see Chaves & Putnam (2020: Chapters 4 and 5).

# **4.3 Conclusion**

Thus, there is a variety of phenomena which suggest that the Minimalist view of constituent structure is too simple. The restriction to binary branching, the assumption that all structures are headed, and Move/Internal Merge all seem problematic. It looks, then, as if the Minimalist view is both too complex and too simple.

# **5 Psycholinguistic issues**

Although they differ in a variety of ways, HPSG and Minimalism agree that grammatical theory is concerned with linguistic knowledge. They focus first and foremost on the question: what form does linguistic knowledge take? But there are other questions that arise here, notably the following:


Both questions are central concerns for psycholinguistics. Thus, in considering the answers that HPSG and Minimalism can give, we are considering their relevance to psycholinguistics. Chomskyan approaches, including Minimalism, have focused mainly on the second question and have paid little attention to the first. HPSG has had more to say about the first and has shown less interest in the second. However, there is a large body of work on acquisition in Construction Grammar, and since HPSG is a constructionist theory (Müller 2021e, Chapter 32 of this volume) all the insights carry over to HPSG. Clearly an adequate grammatical theory should be able to give satisfactory answers to both questions. In this section we will look briefly at the relation of the two theories to processing and then consider more fully their relation to acquisition.

# **5.1 Processing**

We noted in Section 3 that whereas HPSG is a declarative or constraint-based approach to grammar, Minimalism has a procedural view of grammar. This con-

#### 28 HPSG and Minimalism

trast means that HPSG is much more suitable than Minimalism for incorporation into an account of the processes that are involved in linguistic performance.<sup>37</sup>

The most obvious fact about linguistic performance is that it involves both production and comprehension. As noted in Section 3, this suggests that the knowledge that is used in production and comprehension should have a declarative character as in HPSG and not a procedural character as in Minimalism.

A second important feature of linguistic performance is that it involves different kinds of information utilized in any order that is necessary. Sag & Wasow (2011: 367–368) illustrate with the following examples:

	- b. The sheep in the pen had been sleeping and were about to wake up.

In (53a), morphological information determines the number of sheep before nonlinguistic information determines that pen means 'fenced enclosure' and not 'writing implement'. In (53b), on the other hand, non-linguistic information determines that pen means 'fenced enclosure' before morphological information determines the number of sheep. This is unproblematic for an approach like HPSG in which linguistic and non-linguistic knowledge takes the form of constraints which are not ordered in any way.<sup>38</sup> It is quite unclear how the facts can be accommodated within Minimalism given that linguistic knowledge with its procedural form is quite different from non-linguistic knowledge.

Other features of HPSG also make it attractive from a processing point of view. Firstly, there is the fact emphasized earlier that linguistic expressions have a single relatively simple constituent structure with a minimum of phonologically empty elements. Secondly, there is the fact that all constraints are purely local and never affect anything larger than the immediate tree consisting of an expression and its daughters. Both these properties make processing easier than it would otherwise be. Minimalism has neither property and hence again seems less satisfactory than HPSG in this area.

Someone might suppose that the fact that Minimalism treats linguistic knowledge as knowledge about how to construct syntactic structures means that it is well-suited for incorporation into accounts of linguistic performance. In fact this is not at all the case. The way standard Minimalism<sup>39</sup> constructs syntactic

<sup>37</sup>See Bresnan & Kaplan (1982) for an early argument that an approach which can be readily incorporated into an account of linguistic performance is preferable to one which cannot.

<sup>38</sup>See also Lücking (2021), Chapter 27 of this volume on the interaction of gesture and speech.

<sup>39</sup>For a discussion of non-standard versions like Phillips (2003) and Chesi (2015), see Sag & Wasow (2011: Section 10.5) and Müller (2020: 527).

#### Robert D. Borsley & Stefan Müller

structures is quite unlike the way speakers and hearers construct them. Speakers begin with representations of meanings they want to communicate and gradually turn them into an appropriate sequence of sounds, constructing whatever syntactic structures are necessary to do this. Hearers in contrast begin with a sequence of sounds from which they attempt to work out what meanings are being communicated. To do this, they have to segment the sounds into words and determine what sorts of syntactic structures the words are involved in. Language processing is incremental and all channels are used in parallel (Marslen-Wilson 1975; Tanenhaus et al. 1995; 1996). Information about phonology, morphosyntax, semantics, information structure, and even world knowledge (as in the examples (53) above) are used as soon as they are available. Hence, parsing (54) is an incremental process: the hearer hears *Kim* first, and as soon as the first sounds of *may* reach her, the available information is integrated and hypotheses regarding further parts of the utterance are built.<sup>40</sup>

(54) Kim may go to London.

The construction of syntactic structures within Minimalism is a very different matter. It begins with a set of words, and they are gradually assembled into a syntactic structure, from which representations of sound and meaning can be derived, either once a complete structure has been constructed or at the end of each Phase, if the derivation is broken up into Phases. Moreover, the nature of English means that the construction of a syntactic structure essentially proceeds from right to left. Consider the analysis of (54): here, *go* can only be integrated into the structure after its complement *to London* has been constructed, and *may* can only be integrated into the structure after the construction of its complement *go to London*, and only after that can *Kim* be integrated into the structure. This is quite different from the construction of syntactic structures by speakers and hearers, which proceeds from left to right.

These issues have led researchers like Phillips (2003) and Chesi (2015) to propose rather different versions of Minimalism. However, they are still procedural approaches, and they have the problem that any system of procedures which

<sup>40</sup>Note that the architecture in Figure 5 poses additional problems. A *numeration* is a selection of lexical items that is used in a derivation. Since a multitude of empty elements is assumed in Minimalist analyses, it is unclear how such a numeration is constructed, since it cannot be predicted at the lexical level which empty elements will be needed in the course of a derivation. Due to the empty elements, there may be infinitely many possible numerations that might be appropriate for the analysis of a given input string. For processing regimes this would beg the question how the different numerations are involved in processing. Are all these numerations worked with in parallel? This would be rather implausible due to limitations in short-term memory.

#### 28 HPSG and Minimalism

resembles what speakers do will be very different from what hearers do, and vice versa. The right response to the problems outlined above is not a different procedural version of Minimalism but a declarative version, neutral between production and comprehension. It would probably not be difficult to develop a declarative version of the framework. It would presumably have an external merge phrase type and an internal merge phrase type, both subject to appropriate constraints. This would be better from a processing point of view than any procedural version of Minimalism. However, the complexity of its structures and the fact that its constraints are not purely local would still make it less satisfactory than HPSG in this area. For further discussion of how HPSG and Minimalism compare with respect to processing see Chaves & Putnam (2020: Chapters 4 and 5).

# **5.2 Acquisition**

Acquisition has long been a central concern for Chomskyans, who argue that acquisition is made possible by the existence of a complex innate language faculty (Chomsky 1965: Section I.8). Since the early 1980s, the dominant view has been that the language faculty consists of a set of principles responsible for the properties which languages share and a set of parameters responsible for the ways in which they may differ (Chomsky 1981: 6). On this view, acquiring a grammatical system is a matter of parameter-setting (Chomsky 2000: 8). Proponents of HPSG have always been sceptical about these ideas (see, e.g., the remarks about parameters in Pollard & Sag 1994: 31) and have favoured accounts with "an extremely minimal initial ontology of abstract linguistic elements and relations" (Green 2011: 378). Thus, the two frameworks appear to be very different in this area. It is not clear, however, that this is really the case.

The idea that acquiring a grammatical system is a matter of parameter-setting is only as plausible as the idea of a language faculty with a set of parameters. It seems fair to say that this idea has not been as successful as was hoped when it was first introduced in the early 1980s. Outsiders have always been sceptical, but they have been joined in recent times by researchers sympathetic to many Chomskyan ideas. Thus, Newmeyer (2005: 75) writes as follows:

[…] empirical reality, as I see it, dictates that the hopeful vision of UG as providing a small number of principles each admitting of a small number of parameter settings is simply not workable. The variation that one finds among grammars is far too complex for such a vision to be realized.

#### Robert D. Borsley & Stefan Müller

Some Minimalists have come to similar conclusions. Thus, Boeckx (2011: 206) suggests that:

some of the most deeply-embedded tenets of the Principles-and-Parameters approach, and in particular the idea of Parameter, have outlived their usefulness.

Much the same view is expressed in Hornstein (2009: 164–168).

A major reason for scepticism about parameters is that estimates of how many there are seem to have steadily increased. Fodor (2001: 734) considers that there might be just twenty parameters, so that acquiring a grammatical system is a matter of answering twenty yes/no questions. Newmeyer (2005: 44) remarks that "I have never seen any estimate of the number of binary-valued parameters needed to capture all of the possibilities of core grammar that exceeded a few dozen". However, Roberts & Holmberg (2005) comment that "[n]early all estimates of the number of parameters in the literature judge the correct figure to be in the region of 50–100". Clearly, a hundred is a lot more than twenty. Newmeyer (2017: Section 6.3) speaks of "hundreds, if not thousands". This is worrying. As Newmeyer (2006: 6) observes, "it is an ABC of scientific investigation that if a theory is on the right track, then its overall complexity decreases with time as more and more problematic data fall within its scope. Just the opposite has happened with parametric theory. Year after year more new parameters are proposed, with no compensatory decrease in the number of previously proposed ones".

The growing scepticism appears to tie in with the proposal by Hauser, Chomsky & Fitch (2002: 1573) that "FLN [the 'Faculty of language–narrow sense'] comprises only the core computational mechanisms of recursion as they appear in narrow syntax and the mappings to the interfaces". On this view, there seems to be no place for parameters within FLN. This conclusion is also suggested by Chomsky's remarks (2005) that "[t]here is no longer a conceptual barrier to the hope that the UG might be reduced to a much simpler form" (p. 8) and that "we need no longer assume that the means of generation of structured expressions are highly articulated and specific to language" (p. 9). It's hard to see how such remarks are compatible with the assumption that UG includes 50–100 parameters. But if parameters are not part of UG, it is not at all clear what their status might be.

It looks, then, as Minimalists are gradually abandoning the idea of parameters. But if it is abandoned, grammar acquisition is not a matter of parameter-setting. Hence, it is not clear that Minimalists can invoke any mechanisms that are not available to HPSG.

#### 28 HPSG and Minimalism

This might suggest that HPSG and Minimalism are essentially in the same boat where acquisition is concerned. However, this is not the case, given the very different nature of grammatical systems in the two frameworks. The complex and abstract structures that are the hallmark of Minimalism and earlier transformational frameworks pose major problems for acquisition. Furthermore, the machinery that is assumed in addition to the basic operations Internal and External Merge are by no means trivial. There are numerations (subsets of the lexicon) that are assumed to play a role in a derivation, as well as Agree, and acquisition of restrictions on possible probe/goal relations as well as which features are interpretable and which uninterpretable is also necessary. Certain categories are Phase boundaries, others are not. There are complex conditions on labelling. It is this that has led to the assumption that acquisition must be assisted by a complex language faculty. In contrast, HPSG structures are quite closely related to the observable data and so pose less of a problem for acquisition, hence creating less need for some innate apparatus. Thus, HPSG probably has an advantage over Minimalism in this area too. For further discussion of HPSG and acquisition, including L2 acquisition, see Chaves & Putnam (2020: Chapter 7).

There is one further formal aspect that sets HPSG apart from Minimalism and that is relevant for theories of acquisition: HPSG uses typed feature descriptions and the types are organized in hierarchies (see Richter 2021, Chapter 3 of this volume). It is known from research on language acquisition and general cognition that humans classify objects, including linguistic ones (Lakoff 1987; Goldberg 2003; Hudson 2007: 5). While HPSG has the technical machinery to cover this and to represent generalizations (Flickinger, Pollard & Wasow 1985; Pollard & Sag 1987; 1994; Sag 1997), work in MGG usually frowns upon anything coming near the idea of taxonomies (Chomsky 1965: 57, 67; 2008: 135).

# **5.3 Restrictiveness**

There is one further issue that we should discuss here. It appears to be quite widely assumed that one advantage that Minimalism has over alternatives like HPSG is that it is more "restrictive", in other words that it makes more claims about what is and is not possible in language. It looks, then, as if there might be an argument for Minimalism here. It is not clear, however, that this is really the case.

Minimalism would be a restrictive theory making interesting claims about language if it assumed a relatively small number of parameters. However, the idea that there is just a small number of parameters seems to have been abandoned, and at least some Minimalists have abandoned the idea of parameters altogether

#### Robert D. Borsley & Stefan Müller

(see Section 5.2). If there is either a large number of parameters or no parameters at all, Minimalism is not restrictive in the way that it once was. However, it does still embody some restrictions on grammatical systems. The assumption that syntactic structures are confined to binary branching is an important restriction, as is the assumption that expressions produced by Merge have the same label as one of the expressions that they consist of. But we have argued that both assumptions are quite dubious. It also seems to be assumed that case and agreement are features of all grammatical systems. This would be another important restriction, but this also seems dubious given that many languages show no clear evidence for one or both of these features. It looks to us, then, as if the restrictiveness of Minimalism is largely a matter of imposing certain dubious restrictions on grammatical systems.

Note also that there are problems with restrictiveness of a more formal nature. Earlier versions of MGG assumed X theory, and although this was not assumed initially, it was quickly argued that the X scheme is universal and that this is a restriction on grammatical systems that aids language acquisition (Haegeman 1994: 106). However, Kornai & Pullum (1990: 41, 47) show that X theory is not restrictive at all as soon as empty elements are allowed in grammars: all languages that can be analyzed with a context-free grammar can be analyzed with an X grammar with empty heads. Chomsky (1995b: Section 4.3) abandoned X theory and replaced it by notions like first-merged and later-merged (Chomsky 1995b: 245; 2008), but the principled problem remains. Since as many empty heads as needed can be assumed in any position, the predictions as far as restrictiveness is concerned are limited. See also Hornstein (2009: 165) and Starke (2014: 140) on heads, features, and restrictiveness.

An example that is usually discussed when it comes to restrictiveness is question formation (Musso et al. 2003). Researchers in MGG state that certain ways of expressing things never occur, although they may be imaginable. So some may ask why questions are never formed by reversing the order of words in a string. So rather than (55b), the question that would correspond to (55a) would be (55c):

	- b. Did Kim see Sandy near the swimming pool?
	- c. Pool swimming the near Sandy saw Kim?

Interestingly, such reorderings can be derived in systems that allow for so-called *remnant movement*, as Hubert Haider (p. c. 2018) pointed out. Remnant movement analyses are sometimes suggested for partial verb phrase fronting (G. Müller 1998). In the analysis of the following sentence, the object of *gelesen* 'read' is moved out of the VP and the VP remnant is then fronted:

28 HPSG and Minimalism

(56) [VP \_ Gelesen] read hat has [das the Buch] book [keiner nobody \_]. 'Nobody read the book.'

With such a system in place, the reorderings can be derived as follows: the element 3 is combined with 4, 4 moves to the left of 3. The result is combined with 2 and then the unit containing 3 and 4 can move to the left of 2 and [[4 [3 \_]] [2 \_]] is combined with 1 and then moved to the left of 1.

(57) a. [1 [2 [3 4]]] b. [3 4] → [4 [3 \_]] → [2 [4 [3 \_]]] → [[4 [3 \_]] [2 \_]] → [1 [[4 [3 \_]] [2 \_]]] → [[[4 [3 \_]] [2 \_]] [1 \_]]

Of course, there are reasons for the absence of certain imaginable constructions in the languages of the world. The reason for the absence of question formation like (55c) is simply short-term memory. Operations like those are ruled out due to performance constraints and hence should not be modelled in competence grammars. So it is unproblematic that remnant movement systems allow the derivation of strings with reverse order, and it is unproblematic that one might develop HPSG analyses that reverse strings. Similarly, certain other restrictions have been argued not to be part of the grammar proper. For instance, Subjacency (Baltin 1981: 262; 2006; Rizzi 1982: 57; Chomsky 1986: 38–40) does not hold in the form stated in MGG (Müller 2004; 2016: Section 13.1.5) and it is argued that several of the island constraints should not be modelled by hard constraints in competence grammars. See Chaves (2021), Chapter 15 of this volume for further discussion.

It is true that the basic formalism does not pose any strong restrictions on what could be said in an HPSG theory. As Pollard (1997) points out, this is the way it should be. The formalism should not be the constraining factor. It should be powerful enough to allow everything to be expressed in insightful ways and in fact, the basic formalism of HPSG has Turing power, the highest power in the Chomsky hierarchy (Pollard 1999). This means that the general formalism is above the complexity that is usually assumed for natural languages, namely mildly context-sensitive. What is important, though, is that theories of individual languages are much more restrictive, getting the generative power down (Müller 2016: Chapter 17).

These remarks should not be understood as a suggestion that languages vary without limit, as Joos (1958: 96) suggested. No doubt there are universal tendencies and variation is limited, but the question is whether this is due to innate linguistic constraints or a consequence of what we do with language and how

#### Robert D. Borsley & Stefan Müller

our general cognitive capabilities are structured. While Minimalism starts out with claims about universal features about languages and tries to confirm these claims in language after language, researchers working in HPSG aim to develop fragments of languages that are motivated by facts from these languages and generalize over several internally motivated grammars. This leaves the option open that languages can have very little in common as far as syntax is concerned. For example, Koenig & Michelson (2012) discuss the Northern Iroquoian language Oneida and argue that this language does not have syntactic valence. If they are correct, not even central concepts like valence and argument structure would be universal. The only remaining universal would be that we combine linguistic objects. This corresponds to Merge in Minimalism, without the restriction to binarity.

# **6 Conclusion**

We have looked in this chapter at the variety of ways in which HPSG and the Minimalist framework differ. We have considered a number of differences of approach and outlook, including different attitudes to formalization and empirical data. We have highlighted different views of what grammar is, especially contrasting the HPSG declarative approach and the Minimalist derivational approach. We have also explored the very different views of syntactic structure that prevail in the two frameworks, emphasising both the many ways in which Minimalist structures are more complex, but also the ways in which they are simpler. Finally we have looked at psycholinguistic issues, considering both processing and acquisition. In all these areas we have found reasons for favouring HPSG. We conclude, then, that HPSG is the more promising of the two frameworks.

# **Acknowledgments**

We thank Anne Abeillé, Jean-Pierre Koenig, and Tom Wasow for discussion and comments on an earlier version of the paper. We thank David Adger and Andreas Pankau for discussion and Sebastian Nordhoff and Felix Kopecky for help with a figure.

28 HPSG and Minimalism

# **References**


Robert D. Borsley & Stefan Müller


28 HPSG and Minimalism


#### Robert D. Borsley & Stefan Müller


#### 28 HPSG and Minimalism


Robert D. Borsley & Stefan Müller


#### 28 HPSG and Minimalism


#### Robert D. Borsley & Stefan Müller


#### 28 HPSG and Minimalism


#### Robert D. Borsley & Stefan Müller

*A guide to current models*, 378–403. Oxford: Wiley-Blackwell. DOI: 10.1002/ 9781444395037.ch11.


28 HPSG and Minimalism


#### Robert D. Borsley & Stefan Müller

*in America since 1925*, 93–96. New York: American Council of Learned Societies. https://archive.org/details/readingsinlingui00joos (10 February, 2021).


28 HPSG and Minimalism


#### Robert D. Borsley & Stefan Müller


28 HPSG and Minimalism


#### Robert D. Borsley & Stefan Müller

Stanford, CA: CSLI Publications. http://csli-publications.stanford.edu/GEAF/ 2007/ (2 February, 2021).


28 HPSG and Minimalism


#### Robert D. Borsley & Stefan Müller


28 HPSG and Minimalism

*tational Linguistics: 4th International Conference* (Lecture Notes in Computer Science 2099), 17–43. Berlin: Springer Verlag. DOI: 10.1007/3-540-48199-0\_2.


#### Robert D. Borsley & Stefan Müller


28 HPSG and Minimalism


#### Robert D. Borsley & Stefan Müller


#### 28 HPSG and Minimalism


# **Chapter 29**

# **HPSG and Categorial Grammar**

# Yusuke Kubota

National Institute for Japanese Language and Linguistics

This chapter aims to offer an up-to-date comparison of HPSG and Categorial Grammar (CG). Since the CG research itself consists of two major types of approaches with overlapping but distinct goals and research strategies, I start by giving an overview of these two variants of CG. This is followed by a comparison of HPSG and CG at a broad level, in terms of the general architecture of the theory, and then, by a more focused comparison of specific linguistic analyses of some selected phenomena. The chapter ends by briefly touching on issues related to computational implementation and human sentence processing. Throughout the discussion, I attempt to highlight both the similarities and differences between HPSG and CG research, in the hope of stimulating further research in the two research communities on their respective open questions, and so that the two communities can continue to learn from each other.

# **1 Introduction**

The goal of this chapter is to provide a comparison between HPSG and Categorial Grammar (CG). The two theories share certain important insights, mostly due to the fact that they are among the so-called *lexicalist*, *non-transformational* theories of syntax that were proposed as major alternatives to the mainstream transformational syntax in the 1980s (see Borsley & Börjars 2011 and Müller 2019 for overviews of these theories). However, due to the differences in the main research goals in the respective communities in which these approaches have been developed, there are certain nontrivial differences between them as well. The present chapter assumes researchers working in HPSG or other non-CG theories of syntax as its main audience, and aims to inform them of key aspects of CG which make it distinct from other theories of syntax. While computational implementation and investigations of the formal properties of grammatical theory

Yusuke Kubota. 2021. HPSG and Categorial Grammar. In Stefan Müller, Anne Abeillé, Robert D. Borsley & Jean- Pierre Koenig (eds.), *Head-Driven Phrase Structure Grammar: The handbook*, 1331–1394. Berlin: Language Science Press. DOI: 10.5281/zenodo.5599876

### Yusuke Kubota

have been important in both HPSG and CG research, I will primarily focus on the linguistic aspects in the ensuing discussion, with pointers (where relevant) to literature on mathematical and computational issues. Throughout the discussion, I presuppose basic familiarity with HPSG (with pointers to relevant chapters in the handbook). The present handbook contains chapters that compare HPSG with other grammatical theories, including the present one. I encourage the reader to take a look at the other theory comparison chapters too (as well as other chapters dealing with specific aspects of HPSG in greater detail), in order to obtain a fuller picture of the theoretical landscape in current (non-transformational) generative syntax research.

The rest of the chapter is structured as follows. I start by giving an overview of Combinatory Categorial Grammar and Type-Logical Categorial Grammar, two major variants of CG (Section 2). This is followed by a comparison of HPSG and CG at a broad level, in terms of the general architecture of the theory (Section 3), and then, by a more focused comparison of specific linguistic analyses of some selected phenomena (Section 4). The chapter ends by briefly touching on issues related to computational implementation and human sentence processing (Section 5).

# **2 Two varieties of CG**

CG is actually not a monolithic theory, but is a family of related approaches – or, perhaps more accurately, it is much *less of* a monolithic theory than either HPSG or Lexical Functional Grammar (LFG; Kaplan & Bresnan 1982; Bresnan et al. 2016; Wechsler & Asudeh 2021, Chapter 30 of this volume) is. For this reason, I will start my discussion by sketching some important features of two major varieties of CG, Combinatory Categorial Grammar (CCG; Steedman 2000; 2012) and Type-Logical Categorial Grammar (TLCG; or *Type-Logical Grammar*; Morrill 1994; Moortgat 2011; Kubota & Levine 2020).<sup>1</sup> After presenting the "core" component of CG that is shared between the two approaches – which is commonly referred to as the *AB Grammar* – I introduce aspects of the respective approaches in which they diverge from each other.

# **2.1 Notation and presentation**

Before getting started, some comments are in order as to the notation and the mode of presentation adopted. Two choices are made for the notation. First, CCG

<sup>1</sup>For more detailed introductions to these different variants of CG, see Steedman & Baldridge (2011) (on CCG) and Oehrle (2011) (on TLCG), both included in Borsley & Börjars (2011).

#### 29 HPSG and Categorial Grammar

and TLCG traditionally adopt different notations of the slash. I stick to the TLCG notation throughout this chapter for notational consistency. Second, I present all the fragments below in the so-called *labeled deduction* notation of (Prawitz-style) natural deduction. In particular, I follow Oehrle (1994) and Morrill (1994) in the use of "term labels" in labeled deduction to encode prosodic and semantic information of linguistic expressions. This involves writing linguistic expressions as *tripartite signs*, formally, tuples of prosodic form, semantic interpretation and syntactic category (or syntactic type). Researchers familiar with HPSG should find this notation easy to read and intuitive; the idea is essentially the same as how linguistic signs are conceived of in HPSG. In the CG literature, this notation has its roots in the conception of "multidimensional" linguistic signs in earlier work by Dick Oehrle (1988). But the reader should be aware that this is *not* the standard notation in which either CCG or TLCG is typically presented.<sup>2</sup> Also, logically savvy readers may find this notation somewhat confusing since it (unfortunately) obscures certain aspects of CG pertaining to its logical properties. In any event, it is important to keep in mind that different notations co-exist in the CG literature (and the logic literature behind it), and that, just as in mathematics in general, different notations can be adopted for the same formal system to highlight different aspects of it in different contexts. As noted in the introduction, for the mode of presentation, the emphasis is consistently on linguistic (rather than computational or logical) aspects. Moreover, I have taken the liberty to gloss over certain minor differences among different variants of CG for the sake of presentation. The reader is therefore encouraged to consult primary sources as well, especially when details matter.

# **2.2 The AB Grammar**

I start with a simple fragment of CG called the *AB Grammar*, consisting of just two syntactic rules in (1) (here, ◦ designates string concatenation):


With the somewhat minimal lexicon in (2), the sentence *John loves Mary* can be licensed as in (3). The two slashes / and \ are used to form "complex" syntactic categories (more on this below) indicating valence information: the transitive

<sup>2</sup>CCG derivations are typically presented as upside-down parse trees (see, for example, Steedman 2000; 2012) whereas TLCG derivations are typically presented as proofs in Gentzen sequent calculus (see, for example, Moortgat 2011; Barker & Shan 2015).

### Yusuke Kubota

verb *loves* is assigned the category (NP\S)/NP since it first combines with an NP to its right (i.e. the direct object) and then another NP to its left (i.e. the subject).

(2) a. john; NP b. mary; NP c. ran; NP\S d. loves; (NP\S)/NP (3) john; NP mary; NP loves; (NP\S)/NP loves ◦ mary; NP\S \E

john ◦ loves ◦ mary; S

In the notation adopted here, the linear order of words is explicitly represented in the prosodic component of each derived sign. Thus, just like the analysis trees in Linearization-based HPSG (see Müller 2021b: Section 6, Chapter 10 of this volume for an overview), the left-to-right order of elements in the proof tree does not necessarily correspond to the surface order of words. The object NP *Mary* is deliberately placed on the left of the transitive verb *loves* in the proof tree in (3) in order to underscore this point.

/E

At this point, the analysis in (3) is just like the familiar PSG analysis of the form in Figure 1, except that the symbol VP is replaced by NP\S.

John loves Mary

Figure 1: PSG analysis of *John loves Mary.*

Things will start looking more interesting as one makes the fragment more complex (and also by adding the semantics), but before doing so, I first introduce some basic assumptions, first about syntactic categories (below) and then about semantics (next section).

29 HPSG and Categorial Grammar

*Syntactic categories* (or *syntactic types*) are defined recursively in CG. This can be concisely written using the so-called "BNC notation" as follows:3*,*<sup>4</sup>

	- b. Type := BaseType | (Type\Type) | (Type/Type)

In words, anything that is a BaseType is a Type, and any complex expression of form *A*\*B* or *A*/*B* where *A* and *B* are both Types is a Type. To give some examples, the following expressions are syntactic types according to the definition in (4):<sup>5</sup>

	- b. (NP\S)/NP/NP
	- c. (S/(NP\S))\(S/NP)
	- d. ((NP\S)\(NP\S))\((NP\S)\(NP\S))

One important feature of CG is that, like HPSG, it lexicalizes the valence (or subcategorization) properties of linguistic expressions. Unlike HPSG, where this is done by a list (or set) valued syntactic feature, in CG, complex syntactic categories directly represent the combinatoric (i.e. valence) properties of lexical items. For example, lexical entries for intransitive and transitive verbs in English will look like the following (semantics is omitted here but will be supplied later):

	- b. read; (NP\S)/NP
	- c. introduces; (NP\S)/PP/NP

<sup>3</sup>See Section 3.3 below for the treatment of syntactic features (such as those used for agreement). I ignore this aspect for the fragment developed below for the sake of exposition. The treatment of syntactic features (or its analog) is a relatively underdeveloped aspect of CG syntax literature, as compared to HPSG research (where the whole linguistic theory is built on the basis of a theory/formalism of complex feature structures). CCG seems to assume something similar to feature unification in HPSG, though details are typically not worked out explicitly. In TLCG, there are occasional suggestions in the literature (see, for example, Morrill 1994: Chapter 6, Section 2; Pogodalla & Pompigne 2012) that syntactic features can be formalized in terms of dependent types (Martin-Löf 1984; Ranta 1994), but there is currently no in-depth study working out a theory of syntactic features along these lines.

<sup>4</sup>Recognizing PP as a basic type is somewhat non-standard, although there does not seem to be any consensus on what should be regarded as a (reasonably complete) set of basic syntactic types for natural language syntax.

<sup>5</sup> I omit parentheses for a sequence of the same type of slash, for which disambiguation is obvious – for example, *A*\*A*\*A* is an abbreviation for (*A*\(*A*\*A*)).

### Yusuke Kubota

(6a) says that the verb *ran* combines with its argument NP *to its left* to become an S. Likewise, (6b) says that *read* first combines with an NP *to its right* and then another NP to its left to become an S.

One point to keep in mind (though it may not seem to make much difference at this point) is that in CG, syntactic rules are thought of as logical rules and the derivations of sentences like (3) as *proofs* of the well-formedness of particular strings as sentences. From this logical point of view, the two slashes should really be thought of as directional variants of implication (that is, both *A*/*B* and *B*\*A* essentially mean '*if* there is a *B*, *then* there is an *A*'), and the two rules of Slash Elimination introduced in (1) should be thought of as directional variants of *modus ponens* ( → *,*  ` ). This analogy between natural language syntax and logic is emphasized in particular in TLCG research.

# **2.3 Syntax-semantics interface in CG**

One attractive property of Categorial Grammar as a theory of natural language syntax is its straightforward syntax-semantics interface. In particular, there is a functional mapping from syntactic categories to semantic types.<sup>6</sup> For the sake of exposition, I assume an extensional fragment of Montagovian model-theoretic semantics in what follows, but it should be noted that the CG syntax is mostly neutral to the choice of the specific variant of semantic theory to go with it.<sup>7</sup>

Assuming the standard recursive definition of semantic types as in (7) (with basic types for individuals and for truth values), the function Sem (which returns, for each syntactic category given as input, its semantic type) can be defined as in (8) and (9).

	- b. SemType := BaseSemType | SemType → SemType
	- a. Sem(NP) = Sem(PP) =
	- b. Sem(N) = →
	- c. Sem(S) =

<sup>6</sup>Technically, this is ensured in TLCG by the homomorphism from the syntactic type logic to the semantic type logic (the latter of which is often implicit) and the so-called Curry-Howard correspondence between proofs and terms (van Benthem 1988).

<sup>7</sup>See, for example, Martin (2013) and Bekki & Mineshima (2017) for recent proposals on adopting compositional variants of (hyper)intensional dynamic semantics and proof theoretic semantics, respectively, for the semantic component of CG-based theories of natural language.

29 HPSG and Categorial Grammar

(9) (Recursive Clause) For any complex syntactic category of the form *A*/*B* (or *B*\*A*), Sem(*A*/*B*) (= Sem(*B*\*A*)) = Sem(*B*) → Sem(*A*)

For example, Sem(S/(NP\S)) = ( → ) → (for subject position quantifier in CCG).

Syntactic rules with semantics can then be written as in (10) (where the semantic effect of these rules is *function application*) and a sample derivation with semantic annotation is given in (11).

$$\begin{array}{lcl} \text{(10)} & \text{a. forward Slash Elimination:} & \text{b. Backward Slash Elimination:} \\ & \begin{array}{l} \text{a;} \mathcal{F}; A / \mathcal{B} & \mathsf{b}; \mathcal{G}; \mathcal{B} \\ \text{a o b;} \mathcal{F}(\mathcal{G}); A \end{array} / \text{E} & \begin{array}{l} \text{b;} \mathcal{G}; \mathcal{B} & \mathsf{a}; \mathcal{F}; \mathcal{B} \langle A \rangle \\ \text{b o a;} \mathcal{F}(\mathcal{G}); A \end{array} \end{array} \end{array} \left( \begin{array}{l} \text{b;} \mathcal{G}; \mathcal{B} & \mathsf{a}; \mathcal{F}; \mathcal{B} \langle A \rangle \\ \text{b o a;} \mathcal{F}(\mathcal{G}); A \end{array} \right)$$
 (11) 
$$\begin{array}{l} \text{ (n)} & \text{n}; \text{NP} & \text{loves; } \mathsf{love}; (\mathsf{NP} \ \langle \mathsf{S} \rangle / \langle \mathsf{NP} \rangle) \\ \text{ } & \text{loves o many; } \mathsf{love}(\mathsf{m}); \mathsf{NP} \langle \mathsf{S} \rangle \\ \text{ } & \text{john o loves o many; } \mathsf{love}(\mathsf{m}) (\mathsf{j}); \mathcal{S} \end{array} \right) \text{E}$$

A system of CG with only the Slash Elimination rules like the fragment above is called the *AB Grammar*, so called because it corresponds to the earliest form of CG formulated by Ajdukiewicz (1935) and Bar-Hillel (1953).

# **2.4 Combinatory Categorial Grammar**

# **2.4.1 An "ABC" fragment: AB Grammar with order-preserving combinatory rules**

Some more machinery is needed to do some interesting linguistic analysis. I now extend the AB fragment above by adding two types of rules: *Type Raising* and (*Harmonic*) *Function Composition*. These are a subset of rules typically entertained in CCG. I call the resultant system *ABC Grammar* (AB + Function Composition).<sup>8</sup> Though it is an impoverished version of CCG, the ABC fragment already enables an interesting and elegant analysis of *nonconstituent coordination* (NCC), originally due to Steedman (1985) and Dowty (1988), which is essentially identical to the analysis of NCC in the current versions of both CCG and TLCG. I will then discuss the rest of the rules constituting CCG in the next section. The reason for drawing a distinction between the "ABC" fragment and (proper) CCG

<sup>8</sup>This is not a standard terminology, but giving a name to this fragment is convenient for the purpose of the discussion below.

### Yusuke Kubota

is just for the sake of exposition. The rules introduced in the present section have the property that they are all derivable as *theorems* in the (associative) Lambek calculus, the calculus that underlies most variants of TLCG. For this reason, separating the two sets of rules helps clarify the similarities and differences between CCG and TLCG.

The *Type Raising* and *Function Composition* rules are defined as in (12) and (13), respectively.

(12) a. Forward Function Composition: *<sup>a</sup>*; F;*A*/*B <sup>b</sup>*; G; *B*/*C* FC *<sup>a</sup>* ◦ *<sup>b</sup>*; *.*F(G());*A*/*<sup>C</sup>* b. Backward Function Composition: *<sup>b</sup>*; G; *C*\*B <sup>a</sup>*; F; *B*\*A* FC *<sup>b</sup>* ◦ *<sup>a</sup>*; *.*F(G()); *<sup>C</sup>*\*<sup>A</sup>*

(13) a. Forward Type Raising:

*<sup>a</sup>*; F;*A* TR *<sup>a</sup>*; *.*(F); *<sup>B</sup>*/(*A*\*B*)

b. Backward Type Raising:

$$\frac{\mathsf{a}; \mathcal{F}; A}{\mathsf{a}; \lambda v. v(\mathcal{F}); (B/A)\backslash B} \text{ TR}$$

The Type Raising rules are essentially rules of "type lifting" familiar in the formal semantics literature, except that they specify the "syntactic effect" of type lifting explicitly (such that the function-argument relation is reversed). Similarly Function Composition rules can be understood as function composition in the usual sense (as in mathematics and functional programming), except, again, that the syntactic effect is explicitly specified.

As noted by Steedman (1985), with Type Raising and Function Composition, a string of words such as *John loves* can be analyzed as a constituent of type S/NP, that is, an expression that is looking for an NP to its right to become an S:<sup>9</sup>

$$\frac{\text{(14)} \quad \frac{\text{john}; \text{j}; \text{NP}}{\text{john}; \lambda f.f.(\text{j}); \text{S/(NP/S)}} \text{ TR}}{\text{john} \circ \text{loves}; \lambda x.\text{loves}(\text{x})(\text{j}); \text{S/NP}} \text{ FC}$$

Intuitively, Function Composition has the effect of delaying the application of a function. The verb is looking for a direct object to its right before it can be taken as an argument (of type NP\S) of the type raised subject NP. Function Composition directly combines the subject and the verb before the direct object argument of the latter is saturated. The resultant category inherits the unsaturated argument both in the syntactic category (S/NP) and semantics (of type → ).

<sup>9</sup> **love** is a function of type → → , where the first argument corresponds to the direct object. Thus, **love**() () is equivalent to the two-place relation notation **love**(*,* ) in which the subject argument is written first.

#### 29 HPSG and Categorial Grammar

Assuming generalized conjunction (with the standard definition for the generalized conjunction operator u *à la* Partee & Rooth (1983) and the polymorphic syntactic category (*X*\*X*)/*X* for *and*), the analysis for a *Right Node Raising* (RNR) sentence such as (15) is straightforward, as in (16).

(15) John loves, and Bill hates, Mary.

(16)


Dowty (1988)showed that this analysis extends straightforwardly to the (slightly) more complex case of *Argument Cluster Coordination* (ACC), such as (17), as in (18) (here, VP, TV and DTV are abbreviations of NP\S, (NP\S)/NP and (NP\S)/NP/NP, respectively).

(17) Mary gave Bill the book and John the record.

(18) mary; **m**; NP gave; **give**; DTV bill; **b**; NP TR bill;  *.* (**b**); DTV\TV the ◦ book; (**bk**); NP TR the ◦ book; *.*((**bk**)); TV\VP FC bill ◦ the ◦ book; *.*(**b**) ((**bk**)); DTV\VP and; u; (*X*\*X*)/*X . . .* john ◦ the ◦ record; *.*(**j**) ((**rc**)); DTV\VP and ◦ john ◦ the ◦ record; u(*.*(**j**) ((**rc**))); (DTV\VP)\(DTV\VP) bill ◦ the ◦ book ◦ and ◦ john ◦ the ◦ record; *.*(**b**) ((**bk**)) u *.*(**j**) ((**rc**)); DTV\VP gave ◦ bill ◦ the ◦ book ◦ and ◦ john ◦ the ◦ record; **give**(**b**) ((**bk**)) u **give**(**j**) ((**rc**)); VP

mary ◦ gave ◦ bill ◦ the ◦ book ◦ and ◦ john ◦ the ◦ record; **give**(**b**) ((**bk**)) (**m**) ∧ **give**(**j**) ((**rc**)) (**m**); S

Here, by Type Raising, the indirect and direct objects become functions that can be combined via Function Composition, to form a non-standard constituent that can then be coordinated. After two such expressions are conjoined, the verb is fed as an argument to return a VP. Intuitively, the idea behind this analysis is that *Bill the book* is of type DTV\VP since if it were to combine with an actual ditransitive verb (such as *gave*), a VP (*gave Bill the book*) would be obtained. Note Yusuke Kubota

that in both the RNR and ACC examples above, the right semantic interpretation for the whole sentence is assigned compositionally via the rules given above in (12) and (13).

### **2.4.2 From ABC to CCG**

CCG is a version of CG developed by Mark Steedman since the 1980s with extensive linguistic application. The best sources for CCG are the three books by Steedman (Steedman 1996; 2000; 2012), which present treatments of major linguistic phenomena in CCG and give pointers to earlier literature. CCG is essentially a rule-based extension of the AB Grammar. The previous section has already introduced two key components that constitute this extension: Type Raising and (Harmonic) Function Composition.<sup>10</sup> There are aspects of natural language syntax that cannot be handled adequately in this simple system, and in such situations, CCG makes (restricted) use of additional rules. This point can be illustrated nicely with two issues that arise in connection with the analysis of long-distance dependencies.

The basic idea behind the CCG analysis of long-distance dependencies, due originally to Ades & Steedman (1982), is very simple and is similar in spirit to the HPSG analysis in terms of SLASH feature percolation (see Borsley & Crysmann 2021, Chapter 13 of this volume for the treatment of long-distance dependencies in HPSG). Specifically, CCG analyzes extraction dependencies via a chain of Function Composition, as illustrated by the derivation for (19) in (20).

(19) This is the book that John thought that Mary read \_.

Like (many versions of) HPSG, CCG does not assume any empty expression at the gap site. Instead, the information that the subexpressions (constituting the extraction pathway) such as *Mary read* and *thought that Mary read* are missing an NP on the right edge is encoded in the syntactic category of the linguistic expression. *Mary read* is assigned the type S/NP, since it is a sentence missing

<sup>10</sup>There is actually a subtle point about Type Raising rules. Recent versions of CCG (Steedman 2012: 80) do not take them to be syntactic rules, but rather assume that Type Raising is an operation in the lexicon. This choice seems to be motivated by parsing considerations (so as to eliminate as many unary rules as possible from the syntax). It is also worth noting in this connection that the CCG-based syntactic fragment that Jacobson (1999; 2000) assumes for her Variable-Free Semantics is actually a quite different system from Steedman's version of CCG in that it crucially assumes Geach rules, another type of unary rules likely to have similar computational consequences as Type Raising rules, in the syntactic component. (Incidentally, the Geach rules are often attributed to Geach (1970), but Humberstone's (2005) careful historical study suggests that this attribution is highly misleading, if not totally groundless.)

#### 29 HPSG and Categorial Grammar

*.*() ∧ **think**(**read**() (**m**)) (**j**); N\N

an NP on its right edge. *thought that Mary read* is of type VP/NP since it is a VP missing an NP on its right edge, etc. Expressions that are not originally functions (such as the subject NPs in the higher and lower clauses inside the relative clause in (19)) are first type raised. Then, Function Composition effectively "delays" the saturation of the object NP argument of the embedded verb, until the whole relative clause meets the relative pronoun, which itself is a higher-order function that takes a sentence missing an NP (of type S/NP) as an argument.

The successive passing of the /NP specification to larger structures is essentially analogous to the treatment of extraction via the SLASH feature in HPSG. However, unlike HPSG, which has a dedicated feature that handles this information passing, CCG achieves the effect via the ordinary slash that is also used for local syntactic composition.

This difference immediately raises some issues for the CCG analysis of extraction. First, in (19), the NP gap happens to be on the right edge of the sentence, but this is not always the case. Harmonic Function Composition alone cannot handle non-peripheral extraction of the sort found in examples such as the following:

(21) This is the book that John thought that [Mary read \_ at school].

Assuming that *at school* is a VP modifier of type (NP\S)\(NP\S), what is needed here is a mechanism that assigns the type (NP\S)/NP to the string *read* \_ *at school*, despite the fact that the missing NP is not on the right edge. CCG employs a special rule of "Crossed" Function Composition for this purpose, defined as follows:

Yusuke Kubota

(22) Crossed Function Composition:

$$\frac{\mathsf{a}; \mathcal{G}; A/B \qquad \mathsf{b}; \mathcal{F}; A \backslash C}{\mathsf{a} \circ \mathsf{b}; \lambda \mathsf{x}. \mathcal{F}(\mathcal{G}(\mathsf{x})); C/B} \text{ xFC}$$

Unlike its harmonic counterpart (in which *a* has the type *B*\*A*), in (22) the directionality of the slash is different in the two premises, and the resultant category inherits the slash originally associated with the inherited argument (i.e. /*B*).

Once this non-order-preserving version of Function Composition is introduced in the grammar, the derivation for (21) is straightforward, as in (23):


Unless appropriately constrained, the addition of the crossed composition rule leads to potential overgeneration, since non-extracted expressions cannot change word order so freely in English. For example, without additional restrictions, the simple CCG fragment above overgenerates examples such as the following (see, for example, Kuhlmann et al. 2015: 188):

# (24) \* aNP/N [N/N powerfulN/N by RivaldoN\N] shot<sup>N</sup>

Here, I will not go into the technical details of how this issue is addressed in the CCG literature. In contemporary versions of CCG, the application of special rules such as crossed composition in (22) is regulated by the notion of "structural control" borrowed into CCG from the "multi-modal" variant of TLCG (see Baldridge (2002) and Steedman & Baldridge (2011)).

Another issue that arises in connection with extraction is how to treat multiple gaps corresponding to a single filler. The simple fragment developed above cannot license examples involving parasitic gaps such as the following:<sup>11</sup>

	- b. Peter is a guy who even the best friends of \_ think \_ should be closely watched.

<sup>11</sup>Multiple gaps in coordination (i.e. ATB extraction) is not an issue, since these cases can be handled straightforwardly via the polymorphic definition of generalized conjunction in CCG, in just the same way that unsaturated shared arguments in each conjunct are identified with one another.

#### 29 HPSG and Categorial Grammar

Since neither Type Raising nor Function Composition changes the number of "gaps" passed on to a larger expression, a new mechanism is needed here. Steedman (1987: 427) proposes the following rule to deal with this issue:

(26) Substitution:


This rule has the effect of "collapsing" the arguments of the two inputs into one, to be saturated by a single filler. The derivation for the adjunct parasitic gap example in (25a) then goes as follows (where VP is an abbreviation for NP\S):


Like the crossed composition rule, the availability of the substitution rule should be restricted to extraction environments. In earlier versions of CCG, this was done by a stipulation on the rule itself. Baldridge (2002) proposed an improvement of the organization of the CCG rule system in which the applicability of particular rules is governed by lexically specified "modality" encodings. See Steedman & Baldridge (2011) for this relatively recent development in CCG.

# **2.5 Type-Logical Categorial Grammar**

The rule-based nature of CCG should be clear from the above exposition. Though superficially similar in many respects, TLCG takes a distinctly different perspective on the underlying architecture of the grammar of natural language. Specifically, in TLCG, the rule system of grammar is literally taken to be a kind of logic. Consequently, all (or almost all) grammar rules are logical inference rules reflecting the properties of (typically a small number of) logical connectives such as / and \ (which are, as noted in Section 2.2, viewed as directional variants of implication). It is important to keep in mind that this leads to an inherently much more abstract view on the organization of the grammar of natural language than the surface-oriented perspective that HPSG and CCG share at a broad level. This conceptual shift can be best illustrated by first replacing the ABC Grammar introduced in Section 2.4.1 by the *Lambek calculus*, where all the rules posited as

### Yusuke Kubota

primitive rules in the former are derived as *theorems* (in the technical sense of the term) in the latter.

Before moving on, I should hasten to note that the TLCG literature is more varied than the CCG literature, consisting of several related but distinct lines of research. I choose to present one particular variant called Hybrid Type-Logical Categorial Grammar (Kubota & Levine 2020) in what follows, in line with the present chapter's linguistic emphasis (for a more in-depth discussion on the linguistic application of TLCG, see Carpenter 1998 and Kubota & Levine 2020). A brief comparison with major alternatives can be found in Chapter 12 of Kubota & Levine (2020). Other variants of TLCG, most notably, the *Categorial Type Logics* (Moortgat 2011) and *Displacement Calculus* (Morrill 2011) emphasize logical and computational aspects. Moot & Retoré (2012) is a good introduction to TLCG with emphasis on these latter aspects.

### **2.5.1 The Lambek calculus**

In addition to the *Slash Elimination* rules (reproduced here as (28)), which are identical to the two rules in the AB Grammar from Section 2.2, the Lambek calculus posits the *Slash Introduction* rules, which can be written in the current labeled deduction format as in (29) (the vertical dots around the hypothesis abbreviate an arbitrarily complex proof structure).<sup>12</sup>


(29) a. Forward Slash Introduction: b. Backward Slash Introduction:

$$\begin{array}{c} \vdots \quad \frac{[\emptyset; \mathtt{x}; A]^n}{\vdots} \quad \vdots \quad \qquad \qquad \qquad \vdots \quad \frac{[\emptyset; \mathtt{x}; A]^n}{\mathtt{h} \bullet \mathtt{x}; \mathtt{A}]^n} \quad \vdots \\\hline \frac{\mathtt{b} \circ \emptyset; \mathtt{F}; \mathtt{B}}{\mathtt{b} \circ \mathtt{A} \colon \mathtt{F}; \mathtt{B}/\mathtt{A}} \quad \mathtt{I}^n \qquad \qquad \qquad \frac{\vdots \quad \vdots \quad \vdots \\\hline \mathtt{b} \circ \mathtt{A} \circ \mathtt{F}; \mathtt{B}/\mathtt{A} \end{array}$$

The key idea behind the Slash Introduction rules in (29) is that they allow one to derive linguistic expressions by *hypothetically* assuming the existence of words and phrases that are not (necessarily) overtly present. For example, (29a) can be understood as consisting of two steps of inference: one first draws a (tentative)

<sup>12</sup>Morrill (1994: Chapter 4) was the first to recast the Lambek calculus in this labelled deduction format.

#### 29 HPSG and Categorial Grammar

conclusion that the string of words *b* ◦φ is of type *B*, by hypothetically assuming the existence of an expression φ of type *A* (where a hypothesis is enclosed in square brackets to indicate its status as such). At that point, one can draw the (real) conclusion that *b* alone is of type *B*/*A* since it was just shown to be an expression that yields *B if* there is an *A* (namely, φ) to its right. Note that the final conclusion no longer depends on the hypothesis that there is an expression φ of type *A*. More technically, the hypothesis is *withdrawn* at the final step.

One consequence that immediately follows in this system is that Type Raising and Function Composition (as well as other theorems; see, for example, Jäger 2005: 46–49) are now derivable as theorems. As an illustration, the proofs for (13a) and (12a) are shown in (30) and (31), respectively.

(30) [φ;;*A*\*B*] 1 *<sup>a</sup>*; F;*A* \E *<sup>a</sup>* ◦ <sup>φ</sup>;(F); *<sup>B</sup>* /I 1 *<sup>a</sup>*; *.*(F); *<sup>B</sup>*/(*A*\*B*) (31) *<sup>a</sup>*; F;*A*/*B* [φ; ; *C*] 1 *<sup>b</sup>*; G; *B*/*C* /E *<sup>b</sup>* ◦ <sup>φ</sup>; <sup>G</sup>(); *<sup>B</sup>* /E *<sup>a</sup>* ◦ *<sup>b</sup>* ◦ <sup>φ</sup>; <sup>F</sup>(G());*<sup>A</sup>* /I 1 *<sup>a</sup>* ◦ *<sup>b</sup>*; *.*F(G());*A*/*<sup>C</sup>*

These are formal theorems, but they intuitively make sense. For example, what's going on in (31) is simple. Some expression of type *C* is hypothetically assumed first, which is then combined with *B*/*C*. This produces a larger expression of type *B*, which can then be fed as an argument to *A*/*B*. At that point, the initial hypothesis is withdrawn and it is concluded that what one really had was just something that would become an *A if* there is a *C* to its right, namely, an expression of type *A*/*C*. Thus, a sequence of expression of types *A*/*B* and *B*/*C* is proven to be of type *A*/*C*. This type of proof is known as *hypothetical reasoning*, since it involves a step of positing a hypothesis initially and withdrawing that hypothesis at a later point.

Getting back to some notational issues, there are two crucial things to keep in mind about the notational convention adopted here (which I implicitly assumed above). First, the connective ◦ in the prosodic component designates string concatenation and is associative in both directions (i.e. (φ<sup>1</sup> ◦ φ2) ◦ φ<sup>3</sup> ≡ φ<sup>1</sup> ◦ (φ<sup>2</sup> ◦ φ3)). In other words, hierarchical structure is irrelevant for the prosodic representation. Thus, the applicability condition on the Forward Slash Introduction rule (29a) is simply that the prosodic variable φ of the hypothesis appears

### Yusuke Kubota

as the rightmost element of the string prosody of the input expression (i.e. *b* ◦ φ). Since the penultimate step in (31) satisfies this condition, the rule is applicable here. Second, note in this connection that the application of the Introduction rules is conditioned on the position of the prosodic variable, and *not* on the position of the hypothesis itself in the proof tree (this latter convention is more standardly adopted when the Lambek calculus is presented in Prawitz-style natural deduction, though the two presentations are equivalent – see, for example, Carpenter 1998: Chapter 5 and Jäger 2005: Chapter 1).

Hypothetical reasoning with Slash Introduction makes it possible to recast the CCG analysis of nonconstituent coordination from Section 2.4.1 within the logic of / and \. This reformulation fully retains the essential analytic ideas of the original CCG analysis but makes the underlying logic of syntactic composition more transparent.

The following derivation illustrates how the "reanalysis" of the string *Bill the book* as a derived constituent of type (VP/NP/NP)\VP (the same type as in (18)) can be obtained in the Lambek calculus:

$$\begin{array}{lcl} \text{(32)} & \underbrace{\begin{bmatrix} \boldsymbol{\uprho};f;\mathrm{VP/NP/NP} \end{bmatrix}^{1} & \texttt{bill};\mathbf{b};\mathrm{NP}}\_{\begin{array}{l} \boldsymbol{\uprho}\circ\mathrm{bill};f(\mathbf{b});\mathrm{VP/NP} \end{array}} / \mathrm{E} \\ & \begin{array}{l} \boldsymbol{\uprho}\circ\mathrm{bill};f(\mathbf{b});\mathrm{VP/NP} \end{array} / \mathrm{E} \\ & \begin{array}{l} \boldsymbol{\uprho}\circ\mathrm{bill}\circ\mathrm{the}\circ\mathrm{bock};f(\mathbf{b})(\boldsymbol{\uprho}(\mathbf{b}\mathbf{k}));\mathrm{VP} \\ \boldsymbol{\uprho}\text{ill}\circ\mathrm{the}\circ\mathrm{bock};\lambda f.f(\mathbf{b})(\boldsymbol{\uprho}(\mathbf{b}\mathbf{k}));\mathrm{(VP/NP/NP)} \}\mathrm{VP} \end{array} / \mathrm{E} \end{array} / \mathrm{E}$$

At this point, one may wonder what the relationship is between the analysis of nonconstituent coordination via Type Raising and Function Composition in the ABC Grammar in Section 2.4.1 and the hypothetical reasoning-based analysis in the Lambek calculus just presented. Intuitively, they seem to achieve the same effect in slightly different ways. The logic-based perspective of TLCG allows us to obtain a deeper understanding of the relationship between them. To facilitate comparison, I first recast the Type Raising + Function Composition analysis from Section 2.4.1 in the Lambek calculus. The relevant part is the part that derives the "noncanonical constituent" *Bill the book*:

(33) [φ3; ; DTV] 3 [φ2; ; DTV] 2 bill; **b**; NP /E φ<sup>2</sup> ◦ bill; (**b**); TV \I 2 bill;  *.* (**b**); DTV\TV \E φ<sup>3</sup> ◦ bill; (**b**); TV [φ1;; TV] 1 the ◦ book; (**bk**); NP /E φ<sup>1</sup> ◦ the ◦ book;((**bk**)); VP \I 1 the ◦ book; *.*((**bk**)); TV\VP \E φ<sup>3</sup> ◦ bill ◦ the ◦ book; (**b**) ((**bk**)); VP \I 3 bill ◦ the ◦ book; *.*(**b**) ((**bk**)); DTV\VP

#### 29 HPSG and Categorial Grammar

By comparing (33) and (32), one can see that (33) contains some redundant steps. First, hypothesis 2 (φ2) is introduced only to be replaced by hypothesis 3 (φ3). This is completely redundant, since one could have obtained exactly the same result by directly combining hypothesis 3 with the NP *Bill*. Similarly, hypothesis 1 can be eliminated by replacing it with the TV φ<sup>3</sup> ◦ bill on the left-hand side of the third line from the bottom. By making these two simplifications, the derivation in (32) is obtained.

The relationship between the more complex proof in (33) and the simpler one in (32) is parallel to the relationship between an unreduced lambda term (such as [[((**bk**))] ( [ (**b**)] ())] ) and its -normal form (i.e. *.*(**b**) ((**bk**)) ). In fact, there is a formally precise one-to-one relationship between linear logic (of which the Lambek calculus is known to be a straightforward extension) and the typed lambda calculus known as the *Curry-Howard Isomorphism* (Howard 1980), according to which the lambda term that represents the proof (33) -reduces to the term that represents the proof (32).<sup>13</sup> Technically, this is known as *proof normalization* (Jäger 2005: 36–42, 137–144 contains a particularly useful discussion on this notion).

Thus, the logic-based architecture of the Lambek calculus (and various versions of TLCG, which are all extensions of the Lambek calculus) enables us to say, in a technically precise way, how (32) and (33) are the "same" (or, more precisely, equivalent), by building on independently established results in mathematical logic and computer science. This is one big advantage of taking seriously the view, advocated by the TLCG research, that "language *is* logic".

### **2.5.2 Extending the Lambek calculus**

Hypothetical reasoning is a very powerful (yet systematic) tool, but with forward and backward slashes, it is only good for analyzing expressions missing some material at the (right or left) periphery. This is problematic in the analyses of many linguistic phenomena, such as *wh*-extraction (where the "gap" can be in a sentence-medial position – recall the discussion about crossed composition rules in CCG in Section 2.4.2) and quantifier scope (where the quantifier needs to covertly move from a sentence-medial position), as well as various kinds of discontinuous constituency phenomena (see, for example, Morrill et al. 2011, which

<sup>13</sup>There is a close relationship between these lambda terms representing proofs (i.e. syntactic derivations) and the lambda terms that one writes to notate semantic translations, especially if the latter is written at each step of derivation *without* performing -reduction. But it is important to keep in mind that lambda terms representing syntactic proofs and lambda terms notating semantic translations are distinct things.

### Yusuke Kubota

contains analyses of various types of discontinuous constituency phenomena in a recent version of TLCG known as "Displacement Calculus"). In what follows, I sketch one particular, relatively recent approach to this problem, known as *Hybrid Type-Logical Categorial Grammar* (Hybrid TLCG; Kubota 2010; 2015; Kubota & Levine 2015; Kubota & Levine 2020). This approach combines the Lambek calculus with Oehrle's (1994) term-labeled calculus, which deals with discontinuity by employing -binding in the prosodic component.

Hybrid TLCG extends the Lambek calculus with the Elimination and Introduction rules for the *vertical slash*:

$$\begin{array}{llll} \text{(34)} & \text{a. Vertical Slash Introduction:}\\ & \begin{array}{l} \begin{array}{l} \{\emptyset;x;A\}^{n} \\ \hline \end{array} \end{array} & \begin{array}{l} \text{b. Vertical Slash Information:}\\ \begin{array}{l} \text{a; \mathcal{F};A\u{\uparrow}B} \\ \hline \end{array} \begin{array}{l} \text{a; \mathcal{F};A\u{\uparrow}B} \\ \hline \end{array} \begin{array}{l} \text{b; \mathcal{G};B} \\ \hline \end{array} \begin{array}{l} \text{b; \mathcal{G};B} \\ \hline \end{array} \begin{array}{l} \text{!E} \end{array} \end{array} \end{array} \end{array}$$

These rules make it possible to model what (roughly) corresponds to syntactic movement operations in mainstream generative grammar. This is illustrated in (35) for the ∀ *>* ∃ reading for the sentence *Someone talked to everyone today*.

(35) σ**A***.*σ(everyone); **person**; S↾(S↾NP) σ*.*σ(someone); **E person**; S↾(S↾NP) " φ2; 2; NP# 2 talked ◦ to; **talked-to**; (NP\S)/NP " φ1; 1 ; NP#1 /E talked ◦ to ◦ φ1; **talked-to**(1); NP\S \E φ<sup>2</sup> ◦ talked ◦ to ◦ φ1; **talked-to**(1) (2); S today; **tdy**; S\S \E φ<sup>2</sup> ◦ talked ◦ to ◦ φ<sup>1</sup> ◦ today; **tdy**(**talked-to**(1) (2)); S ① ↾I 2 φ2*.*φ<sup>2</sup> ◦ talked ◦ to ◦ φ<sup>1</sup> ◦ today; 2*.***tdy**(**talked-to**(1) (2)); S↾NP ② ↾E someone ◦ talked ◦ to ◦ φ<sup>1</sup> ◦ today; **E person**(2*.***tdy**(**talked-to**(1) (2))); S ↾I 1 φ1*.*someone ◦ talked ◦ to ◦ φ<sup>1</sup> ◦ today; <sup>1</sup> *.* **E person**(2*.***tdy**(**talked-to**(1) (2))); S↾NP ↾E someone **A** ◦ talked ◦ to ◦ everyone ◦ today; **person**(<sup>1</sup> *.* **E person**(2*.***tdy**(**talked-to**(1) (2)))); S

A quantifier has the ordinary GQ meaning ( **E person** and **A person** abbreviate the terms  *.*∃ [**person**() ∧ ()] and  *.*∀ [**person**() → ()], respectively), but its phonology is a function of type (**st**<sup>→</sup> **st**)→**st** (where **st** is the type of string).

#### 29 HPSG and Categorial Grammar

By abstracting over the position in which the quantifier "lowers into" in an S via the Vertical Slash Introduction rule (34a), an expression of type S↾NP (phonologically **st**<sup>→</sup> **st**) is obtained (①), which is then given as an argument to the quantifier. Then, by function application via ↾E (②), the subject quantifier *someone* semantically scopes over the sentence and lowers its phonology to the "gap" position kept track of by -binding in phonology (note that this result obtains by function application and beta-reduction of the prosodic term). The same process takes place for the object quantifier *everyone* to complete the derivation. The scopal relation between multiple quantifiers depends on the order of application of this hypothetical reasoning. The surface scope reading is obtained by switching the order of the hypothetical reasoning for the two quantifiers (which results in the same string of words, but with the opposite scope relation).

This formalization of quantifying-in by Oehrle (1994) has later been extended by Barker (2007) for more complex types of scope-taking phenomena known as *parasitic scope* in the analysis of symmetrical predicates (such as *same* and *different*).<sup>14</sup> Empirical application of parasitic scope includes "respective" readings (Kubota & Levine 2016b), "split scope" of negative quantifiers (Kubota & Levine 2016a) and modified numerals such as *exactly N* (Pollard 2014).

Hypothetical reasoning with prosodic -binding enables a simple analysis of *wh*-extraction too, as originally noted by Muskens (2003: 39–40). The key idea is that sentences with medial gaps can be analyzed as expressions of type S↾NP, as in the derivation for (36) in (37).

(36) Bagels , Kim gave \_ to Chris.

(37) bagels; **b**; NP σφ*.*φ ◦ σ(); F *.*F; (S↾*X*)↾(S↾*X*) kim; **k**; NP gave; **gave**; VP/PP/NP " φ; ; NP#1 /E gave ◦ φ; **gave**(); VP/PP to ◦ chris; **c**; PP /E gave ◦ φ ◦ to ◦ chris; **gave**() (**c**); VP \E kim ◦ gave ◦ φ ◦ to ◦ chris; **gave**() (**c**) (**k**); S ① ↾I 1 φ*.*kim ◦ gave ◦ φ ◦ to ◦ chris; *.***gave**() (**c**) (**k**); S↾NP ② ↾E φ*.*φ ◦ kim ◦ gave ◦ to ◦ chris; *.***gave**() (**c**) (**k**); S↾NP ↾E bagels ◦ kim ◦ gave ◦ to ◦ chris; **gave**(**b**) (**c**) (**k**); S

<sup>14</sup>"Parasitic scope" is a notion coined by Barker (2007) where, in transformational terms, some expression takes scope at LF by parasitizing on the scope created by a different scopal operator's LF movement. In versions of (TL)CG of the sort discussed here, this corresponds to double lambda-abstraction via the vertical slash.

### Yusuke Kubota

Here, after deriving an S↾NP, which keeps track of the gap position via the bound variable φ, the topicalization operator fills in the gap with an empty string and concatenates the topicalized NP to the left of the string thus obtained. This way, the difference between "overt" and "covert" movement reduces to a lexical difference in the prosodic specifications of the operators that induce them. A covert movement operator throws in some material in the gap position, whereas an overt movement operator "closes off" the gap with an empty string.

As illustrated above, hypothetical reasoning for the Lambek slashes / and \ and for the vertical slash ↾ have important empirical motivations, but the real strength of a "hybrid" system like Hybrid TLCG which recognizes both types of slashes is that it extends automatically to cases in which "directional" and "nondirectional" phenomena interact. A case in point comes from the interaction of nonconstituent coordination and quantifier scope. Examples such as those in (38) allow for at least a reading in which the shared quantifier outscopes conjunction.<sup>15</sup>

	- b. Terry said nothing to Robin on Thursday or to Leslie on Friday.

I now illustrate how this wide scope reading for the quantifier in NCC sentences like (38) is immediately predicted to be available in the fragment developed so far (Hybrid TLCG actually predicts both scopal relations for all NCC sentences; see Kubota & Levine 2015: Section 4.3 for how the distributive scope is licensed). The derivation for (38b) is given in (39) on the next page. The key point in this derivation is that, via hypothetical reasoning, the string *to Robin on Thursday or to Leslie on Friday* forms a syntactic constituent with a full-fledged meaning assigned to it in the usual way. Then the quantifier takes scope above this whole coordinate structure, yielding the non-distributive, quantifier wide-scope reading.

Licensing the correct scopal relation between the quantifier and conjunction in the analysis of NCC remains a challenging problem in the HPSG literature. See Section 4.2.1 for some discussion.

<sup>15</sup>Whether the other scopal relation (one in which the quantifier meaning is "distributed" to each conjunct, as in the paraphrase "I gave a couple of books to Pat on Monday and I gave a couple of books to Sandy on Tuesday" for (38)) is possible seems to depend on various factors. With downward-entailing quantifiers such as (38b), this reading seems difficult to obtain without heavy contextualization and appropriate intonational cues. See Kubota & Levine (2015: Section 2.2) for some discussion.

¬ **E thing** (*.***onTh**(**said**() (**r**)) (**t**) ∨ **onFr**(**said**() (**l**)) (**t**)); S Yusuke Kubota

# **3 Architectural similarities and differences**

# **3.1 Broad architecture**

One important property common to HPSG and CG is that they are both lexicalist theories of syntax in the broader sense.<sup>16</sup> This is partly due to an explicit choice made at an early stage of the development of HPSG to encode valence information in the syntactic categories of linguistic expressions, following CG (see Flickinger, Pollard & Wasow 2021: 57–57, Chapter 2 of this volume and Davis & Koenig 2021: Section 3.2, Chapter 4 of this volume).<sup>17</sup> The two theories share many similarities in the analyses of specific linguistic phenomena due to this basic architectural similarity. For example, many phenomena that are treated by means of local movement operations (or via empty categories) in mainstream generative syntax, such as passivization, raising/control in English and complex predicate phenomena in a typologically broad range of languages are generally treated by the sharing of valence information in the lexicon in these theories. For HPSG analyses of these phenomena, see Davis, Koenig & Wechsler (2021), Chapter 9 of this volume, Godard & Samvelian (2021), Chapter 11 of this volume and Abeillé (2021), Chapter 12 of this volume. Steedman & Baldridge (2011) contains a good summary of CG analyses of local dependencies (passivization, raising/control). Kubota (2014: Section 4.2) contains a comparison of HPSG and CG analyses of complex predicates. The heavy reliance on lexicalist analyses of local dependencies is perhaps the most important property that is shared in HPSG and various versions of CG.

But emphasizing this commonality too much may be a bit misleading, since the valence features of HPSG and the slash connectives in CG have very different ontological statuses in the respective theories. The valence features in HPSG are

<sup>16</sup>I say "broader sense" here since not all variants of either HPSG or CG subscribe to the so-called Lexical Integrity Hypothesis (see Davis & Koenig 2021: Section 2, Chapter 4 of this volume), which says that syntax and morphology are distinct components of grammar. For example, in the CG literature, the treatments of verb clustering in Dutch by Moortgat & Oehrle (1994) and in Japanese by Kubota (2014) seem to go against the tenet of the Lexical Integrity Hypothesis. In HPSG, Gunji (1999) formulates an analysis of Japanese causatives that does not adhere to the Lexical Integrity Hypothesis and which contrasts sharply with the strictly lexicalist analysis by Manning et al. (1999). See also Davis & Koenig (2021), Chapter 4 of this volume, Bruening (2018b,a), Müller (2018) and Müller & Wechsler (2014) for some discussion on lexicalism.

<sup>17</sup>This point is explicitly noted by the founders of HPSG in the following passage in Pollard & Sag (1987):

A third principle of universal grammar posited by HPSG, the *Subcategorization Principle*, is essentially a generalization of the "argument cancellation" employed in categorial grammar. (Pollard & Sag 1987: 11)

#### 29 HPSG and Categorial Grammar

primarily specifications, closely tied to the specific phrase structure rules, that dictate the ways in which hierarchical representations are built. To be sure, the lexical specifications of the valence information play a key role in the movementfree analyses of local dependencies along the lines noted above, but still, there is a rather tight connection between these valence specifications originating in the lexicon and the ways in which they are "canceled" in specific phrase structure rules.

Things are quite different in CG, especially in TLCG. As discussed in Section 2, TLCG views the grammar of natural language *not* as a structure-building system, but as a logical deductive system. The two slashes / and \ are thus not "features" that encode the subcategorization properties of words in the lexicon, but have a much more general and fundamental role within the basic architecture of grammar in TLCG. These connectives are literally implicational connectives within a logical calculus. Thus, in TLCG, "derived" rules such as Type Raising and Function Composition are *theorems*, in just the same way that the transitivity inference is a theorem in classical propositional logic. Note that this is not just a matter of high-level conceptual organization of the theory, since, as discussed in Section 2, the ability to assign "constituent" statuses to non-canonical constituents in the CG analyses of NCC directly exploits this property of the underlying calculus. The straightforward mapping from syntax to semantics discussed in Section 2.3 is also a direct consequence of adopting this "derivation as proof" perspective on syntax, building on the results of the Curry-Howard correspondence (Howard 1980) in setting up the syntax-semantics interface.<sup>18</sup>

Another notable difference between (especially a recent variant of) HPSG and CG is that CG currently lacks a detailed theory of (phrasal) "constructions", that is, patterns and (sub)regularities that are exhibited by linguistic expressions that cannot (at least according to the proponents of "constructionist" approaches) be lexicalized easily. As discussed in Müller (2021c), Chapter 32 of this volume (see also Sag 1997, Fillmore 1999 and Ginzburg & Sag 2000), recent constructional variants of HPSG (e.g., Sag's (1997) Constructional HPSG as assumed in this volume and Sign-Based Construction Grammar (SBCG, Sag, Boas & Kay 2012) incorporate ideas from Construction Grammar (Fillmore et al. 1988) and capture such generalizations via a set of constructional templates (or schemata), which are essentially a family of related phrase structure rules that are organized in a type inheritance hierarchy.

<sup>18</sup>Although CCG does not embody the idea of "derivation as proof" as explicitly as TLCG does, it remains true to a large extent that the role of the slash connective within the overall theory is largely similar in CCG and TLCG in that CCG and TLCG share many key ideas in the analyses of actual empirical phenomena.

### Yusuke Kubota

Such an architecture seems nearly impossible to implement literally in CG, except via empty operators or lexical operations corresponding to each such constructional schema. In particular, in TLCG, syntactic rules are logical inference rules, so, if one strictly adheres to its slogan "language is logic", there is no option to freely add syntactic rules in the deductive system. The general consensus in the literature seems to be that while many of the phenomena initially adduced as evidence for a constructional approach can be lexicalized (see, for example, Müller & Wechsler (2014) and Müller (2021c), Chapter 32 of this volume; see also Steedman & Baldridge (2011: 202), which discusses ways in which some of the empirical generalizations that Goldberg (1995) adduces to the notion of constructions can be lexicalized within CCG), there remain some real challenges for a strictly lexicalist approach (Müller 2021c: Section 4.1, Chapter 32 of this volume identifies the *N after N* construction as an instance of this latter type of phenomenon). It then seems undeniable that the grammar of natural language is equipped with mechanisms for dealing with "peripheral" patterns, but whether such mechanisms should be given a central role in the architecture of grammar is still a highly controversial issue. Whatever position one takes, it is important to keep in mind that this is ultimately an empirical question (a very complex and tough one indeed) that should be settled on the basis of (various types of) evidence.

# **3.2 Syntax–semantics interface**

As should be clear from the exposition in Section 2, both CCG and TLCG (at least in the simplest form) adopt a very rigid, one-to-one correspondence between syntax and semantics. Steedman's work on CCG has demonstrated that this simple and systematic mapping between syntax and semantics enables attractive analyses of a number of empirical phenomena at the syntax-semantics interface, including some notorious problems such as the scope parallelism issue in right-node raising known as the Geach paradigm (*Every boy loves, and every girl detests, some saxophonist*; cf. Geach 1970: 8). Other important work on issues at the syntax-semantics interface includes Jacobson's (1999; 2000) work on pronominal anaphora in Variable-Free Semantics (covering a wide range of phenomena including the paycheck/Bach-Peters paradigms and binding parallelism in right-node raising), Barker & Shan's (2015) work on "continuation-based" semantics (weak crossover, superiority effects and "parasitic scope" treatments of symmetrical predicates and sluicing) and Kubota and Levine's (2015; 2017; 2020) Hybrid TLCG, dealing with interactions between coordination, ellipsis and scopal phenomena.

#### 29 HPSG and Categorial Grammar

As discussed in Koenig & Richter (2021), Chapter 22 of this volume, recent HPSG work on complex empirical phenomena at the syntax-semantics interface makes heavy use of underspecification. For example, major analyses of nonconstituent coordination in recent HPSG use some version of an underspecification framework to deal with complex interactions between coordination and scopal operators. (Yatabe 2001; Beavers & Sag 2004; Park et al. 2019; Park 2019; Yatabe & Tam 2021). In a sense, HPSG retains a rigid phrase structure-based syntax (modulo the flexibility entertained with the use of the linearization-based architecture) and deals with the complex mapping to semantics via the use of underspecification languages in the semantic component (such as Minimal Recursion Semantics by Copestake et al. 2005 and Lexical Resources Semantics by Richter & Sailer 2004; see also Koenig & Richter 2021, Chapter 22 of this volume). CG, on the other hand, tends to adhere more closely to a tight mapping from syntax to semantics, but makes the syntactic component itself flexible. But it is important to keep in mind that, even within the CG research community, there is no clear consensus about how strictly one should adhere to the Montagovian notion of compositionality – a glimpse of the recent literature reveals that the issue is very much an open-ended one: many contemporary variants of CG make use of underspecification for certain purposes (see, for example, Steedman 2012: Chapter 7, Bekki 2014, Bekki & Mineshima 2017 and Kubota et al. 2019), while at the same time Jacobson's (1999; 2000) program of Variable-Free Semantics is distinct in explicitly taking the classical notion of compositionality as a driving principle.

# **3.3 Morpho-syntax and word order**

While there is relatively less detailed work on morphology and the morphosyntax interface in CG as compared to HPSG, there are several ideas originating in the CG literature that have either influenced some HPSG work or which are closely related to a certain line of work in HPSG. I review some of these in this section.<sup>19</sup>

# **3.3.1 Linearization-based HPSG and the phenogrammar/tectogrammar distinction in CG**

The idea of separating surface word order and the underlying combinatorics, embodied in the so-called *linearization-based* version of HPSG (Reape 1994; Müller

<sup>19</sup>An important omission in the ensuing discussion is a comparison of recent work in HPSG on morphology by Olivier Bonami and Berthold Crysmann (see Crysmann 2021: Section 4, Chapter 21 of this volume), which builds on and extends Greg Stump's *Paradigm Function Morphology* (PFM; Stump 2001), and early CG work on morphology (Hoeksema 1984; Moortgat 1984; Hoeksema & Janda 1988; Raffelsiefen 1992) which could be viewed as precursors of PFM.

### Yusuke Kubota

1995; Kathol 2000; cf. Müller 2021b: Section 6, Chapter 10 of this volume), has its origin in the work by the logician Haskell Curry (1961), in which he proposed the distinction between *phenogrammar* (the component pertaining to surface word order) and *tectogrammar* (underlying combinatorics). This same idea has influenced a certain line of work in the CG literature too. Important early work was done by Dowty (1982a; 1996) in a variant of CG which is essentially an AB Grammar with "syncategorematic" rules that directly manipulate string representations, of the sort utilized in Montague Grammar, for dealing with various sorts of discontinuous constituency.<sup>20</sup>

Dowty's early work has influenced two separate lines of work in the later development of CG. First, a more formally sophisticated implementation of an enriched theory of phenogrammatical component of the sort sketched in Dowty (1996) was developed in the literature on Multi-Modal Categorial Type Logics in the 90s, by exploiting the notion of "modal control" (as already noted, this technique was later incorporated into CCG by Baldridge 2002: Chapter 5). Some empirical work in this line of research includes Moortgat & Oehrle (1994) (on Dutch cross-serial dependencies; see also Dowty 1997: Section 4 for an accessible exposition of this analysis), Kraak (1998) (French clitic climbing), Whitman (2009) ("right-node wrapping" in English) and Kubota (2010; 2014) (complex predicates in Japanese). Second, the Curry/Dowty idea of the pheno/tecto distinction has also been the core motivation for the underlying architecture of a family of approaches called *Linear Categorial Grammar* (LCG; Oehrle 1994; de Groote 2001; Muskens 2003; Mihaliček & Pollard 2012; Pollard 2013), in which, following the work of Oehrle (1994), the prosodic component is modeled as a lambda calculus (cf. Section 2.5.2) for dealing with complex operations pertaining to word order (the more standard approach in the TLCG tradition is to model the prosodic component as some sort of algebra of structured strings as in Morrill et al. 2011 (and at least implicitly in Moortgat 1997: Section 4)). In fact, among different variants of CG, LCG can be thought of as an extremist approach in relegating word order completely from the combinatorics, by doing away with the distinction between the Lambek forward and backward slashes.

One issue that arises for approaches that distinguish between the levels of phenogrammar and tectogrammar, across the HPSG/CG divide, is how closely these two components interact with one another. Kubota (2014: Section 2.3) discusses some data in the morpho-syntax of complex predicates in Japanese which

<sup>20</sup>See also Flickinger, Pollard & Wasow (2021: Section 2.2), Chapter 2 of this volume for a discussion of the influence that early forms of CG (Bach 1979; 1980; Dowty 1982a; Dowty 1982b) had on Head Grammar (Pollard 1984), a precursor of HPSG.

#### 29 HPSG and Categorial Grammar

(according to him) would call for an architecture of grammar in which the pheno and tecto components interact with one another closely, and which would thus be problematic for the simpler LCG-type architecture. It would be interesting to see whether/to what extent this same criticism would carry over to linearizationbased HPSG, which is similar (at least in its simplest form) to LCG in maintaining a clear separation of the pheno/tecto components.<sup>21</sup>

### **3.3.2 Syntactic features and feature neutralization**

As compared to HPSG, the status of syntactic features in CG is somewhat unclear, despite the fact that such "features" are often used in linguistic analyses in the CG literature. One reason that a full-blown theory of syntactic features has not been developed in CG research to date seems to be that as compared to HPSG, syntactic features play a far less major role in linguistic analysis in CG. Another possible reason is that empirical work on complex linguistic phenomena (especially on languages other than English) are still very few in number in CG.

It is certainly conceivable to develop a theory of syntactic features and feature underspecification within CG by borrowing ideas from HPSG, for which there is already a rich tradition of foundational work on this issue. In fact, the work on Unification-based Categorial Grammar (Calder, Klein & Zeevat 1988) explored at the end of the 80s seems to have had precisely such a goal. Unfortunately, this approach remains largely isolated from other developments in the literature (of either CG or other grammatical theories/formalisms). Another possibility would be to pursue a more logic-based approach. For some ideas, see Bayer & Johnson (1995), Bayer (1996) and Morrill (1994). Morrill (1994: Chapter 6) in particular briefly explores the idea of implementing syntactic features via the notion of *dependent types*. There is some renewed interest in the linguistic application of ideas from Dependent Type Theory (Martin-Löf 1984) in the recent literature of CG and formal semantics (see, for example, Chatzikyriakidis & Luo 2017), so pursuing this latter type of approach in connection with this new line of work may lead to some interesting developments.

One issue that is worth noting in connection to syntactic features is the treatment of case syncretism and feature neutralization (cf. Przepiórkowski 2021: Section 3, Chapter 7 of this volume). The work by Morrill (1994: Chapter 6), Bayer (1996) and Bayer & Johnson (1995) mentioned above proposed an approach to

<sup>21</sup>But note also in this connection that linearization-based HPSG is by no means monolithic; for example, Yatabe & Tam (2021) (discussed below in Section 4.2.1) propose a somewhat radical extension of the linearization-based approach in which semantic composition is done at the level of word order domains.

Yusuke Kubota

feature neutralization by positing meet and join connectives (which are like conjunction and disjunction in propositional logic) in CG. The key idea of this approach was recast in HPSG by means of inheritance hierarchies by Levy (2001), Levy & Pollard (2002) and Daniels (2002). <sup>22</sup> See Przepiórkowski (2021: Section 3), Chapter 7 of this volume for an exposition of this HPSG work on feature neutralization.

# **4 Specific empirical phenomena**

Part II of the present handbook contains an excellent introduction to recent developments of HPSG research on major linguistic phenomena. I will therefore presuppose familiarity with such recent analyses, and my discussion below aims to highlight the differences between HPSG and CG in the analyses of selected empirical phenomena. In order to make the ensuing discussion maximally informative, I focus on phenomena over which there is some ongoing major crosstheoretical debate, and those for which I believe one or the other theory would benefit from recent developments/rich research tradition in the other.

# **4.1 Long-distance dependencies**

As noted in Section 2.4, CCG treats long-distance dependencies via a sequence of Function Composition, which is similar to the SLASH percolation analysis in HPSG. CCG offers a treatment of major aspects of long-distance dependencies, including island effects (Steedman 2000: Section 4.2) and parasitic gaps (Steedman 1987). Earlier versions of CCG involved a somewhat ad-hoc stipulation on the use of crossed composition rules (Steedman 1996). This was overcome in the more recent, multi-modal variant of CCG (Baldridge 2002), which controls the application of such non-order-preserving rules via a fine-grained system of lexicalized modality. The modality specifications in this new version of CCG enable one to relocate language-specific idiosyncrasies to the lexicon, in line with the general spirit of lexicalist theories of grammar.

The situation is somewhat different in TLCG. TLCG typically makes use of a movement-like operation for the treatment of extraction phenomena (via hypothetical reasoning), but the specific implementations differ considerably in different variants of TLCG. Major alternatives include the approach in terms of "structural control" in Multi-Modal Categorial Type Logics (cf. Bernardi 2002: Chap-

<sup>22</sup>As noted by Levy (2001), the type hierarchy-based rendering of "meet" and "join" was first introduced in HPSG by Levine, Hukari & Calcagno (2001: Section 6.3.2).

#### 29 HPSG and Categorial Grammar

ter 1; Moortgat 2011: Section 2.4; see also Morrill 1994: Chapter 7), and the one involving prosodic -binding in LCG and related approaches (see Section 2.5.2). In either approach, extraction phenomena are treated by means of some form of hypothetical reasoning, and this raises a major technical issue in the treatment of multiple gap phenomena. The underlying calculus of TLCG is a version of linear logic, and this means that the implication connective is resource sensitive. This is problematic in situations in which a single filler corresponds to multiple gaps, as in parasitic gaps and related phenomena. These cases of extraction require some sort of extension of the underlying logic or some special operator that is responsible for resource duplication. Currently, the most detailed treatment of extraction phenomena in the TLCG literature is Morrill (2017), which lays out in detail an analysis of long-distance dependencies capturing both major island constraints and parasitic gaps within the most recent version of Morrill's Displacement Calculus.

There are several complex issues that arise in relation to the linguistic analysis of extraction phenomena. One major open question is whether island constraints should be accounted for within narrow grammar. Both Steedman and Morrill follow the standard practice in Generative Grammar research in taking island effects to be syntactic, but this consensus has been challenged by a new body of research in the recent literature proposing various alternative explanations on different types of island constraints (some important work in this tradition includes Deane (1992), Kluender (1998), Hofmeister & Sag (2010) and Chaves & Putnam (2020); see Chaves (2021), Chapter 15 of this volume, Levine (2017) and Newmeyer (2016) for an overview of this line of work and pointers to the relevant literature). Recent syntactic analyses of long-distance dependencies in the HPSG literature explicitly avoid directly encoding major island constraints within the grammar (Sag 2010; Chaves 2012b). Unlike CCG and Displacement Calculus, Kubota & Levine's Hybrid TLCG opts for this latter type of view (that is, the one that is generally in line with recent HPSG work; see Kubota & Levine 2020: Chapter 10).

Another major empirical problem related to the analysis of long-distance dependencies is the so-called *extraction pathway marking* phenomenon (McCloskey 1979; Zaenen 1983). While this issue received considerable attention in the HPSG literature, through a series of work by Levine and Hukari (see Levine & Hukari 2006), there is currently no explicit treatment of this phenomenon in the CG literature. CCG can probably incorporate the HPSG analysis relatively easily, given the close similarity between the SLASH percolation mechanism and the step-bystep inheritance of the /NP specification in the Function Composition-based ap-

### Yusuke Kubota

proach in CCG. Extraction pathway marking poses a much trickier challenge to TLCG, in which extraction is typically handled by a single-chain movementlike process by means of hypothetical reasoning (but see Kubota & Levine (2020: Chapter 7) for a sketch of a possible approach which mimics successive cyclic movement in the type-logical setup).

Finally, pied-piping poses a somewhat tricky issue for the analysis of relativization in CG (see, for example, Pollard 1988; Morrill 1994; Müller 2019: Section 8.6; see also Arnold & Godard (2021: footnote 3), Chapter 14 of this volume). To see this point, note that the analysis of (simple cases of) relative clauses in CCG outlined in Section 2.4.2 above does not straightforwardly extend to pied-piping examples such as the following:

	- b. Reports the height of the lettering on the covers of which the government prescribes should be abolished. (Ross 1967: 109)

In these examples, the relative pronoun is embedded inside the fronted relative phrase, so, a simple (N\N)/(S/NP) assignment doesn't work. Morrill (1994: Chapter 4, Section 3.3) proposes a more sophisticated treatment in TLCG (see Carpenter 1998: Section 9.7 for a lucid exposition of this analysis), which can be thought of as a translation of the HPSG analysis (Pollard & Sag 1994: Chapter 5) involving two types of long-distance dependency (handled by the REL and SLASH features in HPSG, see also Arnold & Godard (2021), Chapter 14 of this volume).

In Hybrid TLCG, Morrill's analysis of pied-piping can be implemented by positing the following lexical entry for the relative pronoun *whom* (for examples such as (40a) where the fronted relative phrase is an argument PP; the entry needs to be generalized to cover other cases involving fronted elements with different syntactic categories):

# (41) σ1σ2*.*σ<sup>1</sup> (whom) ◦ σ<sup>2</sup> (); *.* ( ()) ∧ (); (N\N)↾(S↾PP)↾(PP↾NP)

The entry in (41) says that the relative pronoun takes two arguments, a PP missing an NP inside itself and an S missing a PP, and then becomes a nominal modifier. Note that the two types of long-distance dependency mediated by REL and SLASH in HPSG are both handled by the vertical slash in this analysis. The relative pronoun itself is embedded inside the PP in the prosodic representation to form a relative phrase which appears as a fronted expression in the surface string.

Since the vertical slash mediates long-distance dependencies, this analysis avoids the problem of ad-hoc proliferation of lexical entries for pied-piped relative pronouns corresponding to different levels of embedding (which was the

29 HPSG and Categorial Grammar

main point of criticism in Pollard's (1988) critique of an earlier CG analysis). In this sense, this CG analysis is a fairly straightforward reimplementation of the Pollard & Sag (1994) analysis. One possible difference between the HPSG analysis and the CG analysis of the sort sketched above is that the latter requires positing different lexical entries for relative pronouns corresponding to different syntactic types of the relative phrase. If it turns out that the constraints on what can be preposed are largely orthogonal to narrow syntax,<sup>23</sup> there may be an advantage for an analysis in HPSG that posits a general PS rule or constructional schema for licensing pied-piping relative clauses.

# **4.2 Coordination and ellipsis**

Coordination and ellipsis are both major issues in contemporary syntactic theory. There are moreover some phenomena, such as Gapping and Stripping, which seem to lie at the boundary of the two empirical domains (see, for example, the recent overview by Johnson 2018). There are some important similarities and differences between analytic ideas entertained in the HPSG and CG literature for problems in these empirical domains.

### **4.2.1 Analyses of nonconstituent coordination**

CG is perhaps best known in the linguistics literature for its analysis of nonconstituent coordination. Steedman's work on CCG (Steedman 1996; 2000; 2012) in particular has shown how this analysis of coordination interacts smoothly with analyses of other major linguistic phenomena (such as long-distance dependencies, control and raising and quantification) to achieve a surface-oriented grammar that has wide empirical coverage and at the same time has attractive computational properties. Kubota & Levine (2015); Kubota & Levine (2020) offer an up-to-date TLCG analysis of coordination, and compare it with major alternatives in both the CCG and HPSG literature.

As compared to long-distance dependencies, coordination (in particular NCC) has received considerably less attention in the (H)PSG literature initially (Sag et al. 1985 is an important exception in the early literature). Things started to change somewhat around 2000, with a series of related proposals appearing one after another, including Yatabe (2001), Beavers & Sag (2004), Chaves (2007) and Crysmann (2008) (see Abeillé & Chaves 2021: Section 7, Chapter16 of this volume and Nykiel & Kim 2021: Section 6, Chapter 19 of this volume). Here, I take up

<sup>23</sup>The question of which syntactic category can be pied-piped is actually a rather thorny issue. See Arnold & Godard (2021: Section 2.1.1), Chapter 14 of this volume for some discussion.

### Yusuke Kubota

Beavers & Sag (2004) and Yatabe (2001) (updated in Yatabe & Tam 2021) as two representative proposals in this line of work. The two proposals share some common assumptions and ideas, but they also differ in important respects.

Both Beavers & Sag (2004) and Yatabe (2001) adopt linearization-based HPSG, together with (a version of) Minimal Recursion Semantics for semantics. Of the two, Beavers & Sag's analysis is more in line with standard assumptions in HPSG. The basic idea of Beavers & Sag's analysis is indeed very simple: by exploiting the flexible mapping between the combinatoric component and the surface word order realization in linearization-based HPSG, they essentially propose a surface deletion-based analysis of NCC according to which NCC examples are analyzed as follows:

(42) [<sup>S</sup> Terry gave no man a book on Friday] or [<sup>S</sup> Terry gave no man a record on Saturday].

where the material in strike-out is underlyingly present but undergoes deletion in the prosodic representation.

In its simplest form, this analysis gets the scopal relation between the quantifier and coordination wrong in examples like (42) (a well-known problem for the conjunction reduction analysis from the 70s; cf. Partee 1970). Beavers & Sag address this issue by introducing a constraint called *Optional Quantifier Merger*:

(43) *Optional Quantifier Merger*: For any elided phrase denoting a generalized quantifier in the domain of either conjunct, the semantics of that phrase may optionally be identified with the semantics of its non-elided counterpart.

As noted by Levine (2011) and Kubota & Levine (2015: Section 3.2.1), this condition does not follow from any general principle and is merely stipulated in Beavers & Sag's account.

Yatabe (2001) and Yatabe & Tam (2021) (the latter of which contains a much more accessible exposition of essentially the same proposal as the former) propose a somewhat different analysis. Unlike Beavers & Sag, who assume that semantic composition is carried out on the basis of the meanings of *signs* on each node (which is the standard assumption about semantic composition in HPSG), Yatabe shifts the locus of semantic composition to the list of domain objects, that is, the component that directly gets affected by the deletion operation that yields the surface string.

#### 29 HPSG and Categorial Grammar

This crucially changes the default meaning predicted for examples such as (42). Specifically, on Yatabe's analysis, the surface string for (42) is obtained by the "compaction" operation on word order domains that collapses two quantifiers originally contained in the two conjuncts into one. The semantics of the whole sentence is computed on the basis of this resultant word order domain representation, which contains only *one* instance of a domain object corresponding to the quantifier. The quantifier is then required to scope over the whole coordinate structure due to independently motivated principles of underspecification resolution. While this approach successfully yields the wide-scope reading for quantifiers, the distributive, narrow scope reading for quantifiers (which was trivial for Beavers & Sag) now becomes a challenge. Yatabe & Tam simply stipulate a complex disjunctive constraint on semantic interpretation tied to the "compaction" operation that takes place in coordination so as to generate the two scopal readings.

Kubota & Levine (2015: Section 3.2.2) note that, in addition to the quantifier scope issue noted above, Beavers & Sag's approach suffers from similar problems in the interpretations of symmetrical predicates (*same*, *different*, etc.), summative predicates (*a total of X*, *X in total*, etc.) and the so-called "respective" readings of plural and conjoined expressions (see Chaves 2012a for a lucid discussion of the empirical parallels between the three phenomena and how the basic cases can receive a uniform analysis within HPSG). Yatabe & Tam (2021) offer a response to Kubota & Levine, working out explicit analyses of these more complex phenomena in linearization-based HPSG. A major point of disagreement between Kubota & Levine on the other hand and Yatabe & Tam on the other seems to be whether/to what extent an analysis of a linguistic phenomenon should aim to explain (as opposed to merely account for) linguistic generalizations. There is no easy answer to this question, and it is understandable that different theories put different degrees of emphasis on this goal. Whatever conclusion one draws from this recent HPSG/CG debate on the treatment of nonconstituent coordination, one point seems relatively uncontroversial: coordination continues to constitute a challenging empirical domain for any grammatical theory, consisting of both highly regular patterns such as systematic interactions with scopal operators (Kubota & Levine 2015; Kubota & Levine 2020) and puzzling idiosyncrasies, the latter of which includes the summative agreement facts (Postal 1998; Yatabe & Tam 2021) and extraposed relative clauses with split antecedents (Perlmutter & Ross 1970; Link 1984; Kiss 2005; Yatabe & Tam 2021).

Yusuke Kubota

# **4.2.2 Gapping and Stripping**

Descriptively, Gapping is a type of ellipsis phenomenon that occurs in coordination and which deletes some material including the main verb:<sup>24</sup>

	- b. Terry *can go* with me, and Pat ∅ with you.
	- c. John *wants to try to begin to write* a novel, and Mary ∅ a play.

Gapping has invoked some theoretical controversy in the recent HPSG/CG literature for the "scope anomaly" issue that it exhibits. The relevant data involving auxiliary verbs such as (45a) and (45b) have long been known in the literature since Oehrle (1971; 1987) and Siegel (1987). McCawley (1993: 247) later pointed out similar examples involving downward-entailing quantifiers of the sort exemplified by (45c).

	- b. Kim didn't play bingo or Sandy ∅ sit at home all evening.
	- c. No dog eats Whiskas or ∅ cat ∅ Alpo.

The issue here is that (45a), for example, has a reading in which the modal *can't* scopes over the conjunction ('it's not possible for Mrs. J to live in NY and Mr. J to live in LA at the same time'). This is puzzling, since such a reading wouldn't be predicted on the (initially plausible) assumption that Gapping sentences are interpreted by simply supplying the meaning of the missing material in the right conjunct.

Kubota & Levine (2016a) and Kubota & Levine (2020: Section 3.1) note some difficulties for earlier accounts of Gapping in the (H)PSG literature (Sag et al. 1985; Abeillé et al. 2014) and argue for a constituent coordination analysis of Gapping in TLCG, building on earlier analyses of Gapping in CG (Steedman 1990; Hendriks 1995b; Morrill & Solias 1993). The key idea of Kubota & Levine's analysis involves taking Gapping as coordination of clauses missing a verb in the middle, which can be transparently represented as a function from strings to strings of category S↾((NP\S)/NP) (for (44a), for example):

(46) φ*.*leslie ◦ φ ◦ <sup>a</sup> ◦ cd; *.*∃*.***cd**() ∧ () (**l**); S↾((NP\S)/NP)

<sup>24</sup>There is some disagreement as to whether Gapping is restricted to coordination. Kubota & Levine (2016a), following authors such as Johnson (2009), take Gapping to be restricted to coordination. Park et al. (2019) and Park (2019) take a different view, and argue that Gapping should be viewed as a type of ellipsis phenomenon that is not restricted to coordination environments. See Kubota & Levine (2020: 46–47) for a response to Park et al. (2019).

#### 29 HPSG and Categorial Grammar

A special type of conjunction entry (prosodically of type (**st**→**st**)→(**st**→**st**)→(**st**→**st**)) then conjoins two such expressions and returns a conjoined sentence missing the verb only in the first conjunct (on the prosodic representation). By feeding the verb to this resultant expression, a proper form-meaning pair is obtained for Gapping sentences like those in (44).

The apparently unexpected wide scope readings for auxiliaries and quantifiers in (45) turn out to be straightforward on this analysis. I refer the interested reader to Kubota & Levine (2016a) (and Kubota & Levine (2020: Chapter 3)) for details, but the key idea is that the apparently anomalous scope in such examples isn't really anomalous on this approach, since the auxiliary (which prosodically lowers into the first conjunct) takes the whole conjoined gapped clause as its argument in the combinatoric component underlying semantic interpretation.<sup>25</sup> Thus, the existence of the wide scope reading is automatically predicted. Puthawala (2018) extends this approach to a similar "scope anomaly" data found in Stripping, in examples such as the following:

(47) John didn't sleep, or Mary (either).

Just like the Gapping examples in (45), this sentence has both wide scope ('neither John nor Mary slept') and narrow scope ('John was the one who didn't sleep, or maybe that was Mary') interpretations for negation.

The determiner gapping example in (45c) requires a somewhat more elaborate treatment. Kubota & Levine (2016a) analyze determiner gapping via higher-order functions. Morrill & Valentín (2017) criticize this approach for a certain type of overgeneration problem regarding word order and propose an alternative analysis in Displacement Calculus.

Park et al. (2019) and Park (2019) propose an analysis of Gapping in HPSG that overcomes the limitations of previous (H)PSG analyses (Sag et al. 1985: Section 4.3; Chaves 2009; Abeillé et al. 2014), couched in Lexical Resources Semantics. In Park et al.'s analysis, the lexical entries of the clause-level conjunction words *and* and *or* are underspecified as to the relative scope between the propositional operator contributed by the modal auxiliary in the first conjunct and the Boolean conjunction or disjunction connective that is contributed by the conjunction word itself. Park et al. argue that this is sufficient for capturing the scope anomaly in the Oehrle/Siegel data such as (45a) and (45b). Extension to the determiner gapping case (45c) is left for future work.

Here again, instead of trying to settle the debate, I'd like to draw the reader's attention to the different perspectives on grammar that seem to be behind the HPSG and (Hybrid) TLCG approaches. Kubota & Levine's approach attains the-

<sup>25</sup>This is essentially a formalization of an idea that goes back to Siegel's (1987) work.

### Yusuke Kubota

oretical elegance at the cost of employing abstract higher-order operators (both in semantics and prosody). This makes the relationship between the competence grammar and the on-line human sentence processing model indirect, and relatedly, it is likely to make efficient computational implementation less straightforward (for a discussion on the relationship between competence grammar and a model of sentence processing, see Wasow 2021, Chapter 24 of this volume and Borsley & Müller 2021: Section 5.1, Chapter 28 of this volume). Park et al.'s (2019) approach, on the other hand, is more in line with the usual practice (and the shared spirit) of HPSG research, where the main emphasis is on writing an explicit grammar fragment that is constraint-based and surface-oriented. This type of tension is perhaps not easy to overcome, but it seems useful (for researchers working in different grammatical theories) to at least recognize (and appreciate) the existence of these different theoretical orientations tied to different approaches.

### **4.2.3 Ellipsis**

Analyses of major ellipsis phenomena in HPSG and CG share the same essential idea that ellipsis is a form of anaphora, without any invisible hierarchically structured representations corresponding to the "elided" expression. See Nykiel & Kim (2021), Chapter 19 of this volume and Ginzburg & Miller (2018) for an overview of approaches to ellipsis in HPSG.

Recent analyses of ellipsis in HPSG (Ginzburg & Sag 2000: Chapter 8; Miller 2014) make heavy use of the notion of "construction" adopted from Construction Grammar (this idea is even borrowed into some of the CG analyses of ellipsis such as Jacobson 2016). Many ellipsis phenomena are known to exhibit some form of syntactic sensitivity (Kennedy 2003; Chung 2013; Yoshida et al. 2015), and this fact has long been taken to provide strong evidence for the "covert structure" analyses of ellipsis popular in Mainstream Generative Grammar (Merchant 2019).

Some of the early works on ellipsis in CG include Hendriks (1995a) and Morrill & Merenciano (1996). Morrill & Merenciano (1996) in particular show how hypothetical reasoning in TLCG allows treatments of important properties of ellipsis phenomena such as strict/sloppy ambiguity and scope ambiguity of elided quantifiers in VP ellipsis. Jäger (2005) integrates these earlier works with a general theory of anaphora in TLCG, incorporating the key empirical analyses of pronominal anaphora by Jacobson (1999; 2000). Jacobson's (1998; 2008) analysis of Antecedent-Contained Ellipsis is also important. Antecedent-Contained Ellipsis is often taken to provide a strong piece of evidence for the representational analysis of ellipsis in Mainstream Generative Syntax. Jacobson offers a counterproposal to this standard analysis that completely dispenses with covert

#### 29 HPSG and Categorial Grammar

structural representations. While the above works from the 90s have mostly focused on VP ellipsis, recent developments in the CG literature, including Barker (2013) on sluicing, Jacobson (2016) on fragment answers and Kubota & Levine (2017) on pseudogapping, considerably extended the empirical coverage of the same line of analysis.

The relationship between recent CG analyses of ellipsis and HPSG counterparts seems to be similar to the situation with competing analyses on coordination. Both Barker (2013) and Kubota & Levine (2017) exploit hypothetical reasoning to treat the antecedent of an elided material as a "constituent" with fullfledged semantic interpretation at an abstract combinatoric component of syntax. The anaphoric mechanism can then refer to both the syntactic and semantic information of the antecedent expression to capture syntactic sensitivity observed in ellipsis phenomena, without the need to posit hierarchical representations at the ellipsis site. Due to its surface-oriented nature, HPSG is not equipped with an analogous abstract combinatoric component that assigns "constituent" status to expressions that do not (in any obvious sense) correspond to constituents in the surface representation. In HPSG, the major work in restricting the possible form of ellipsis is instead taken over by constructional schemata, which can encode syntactic information of the antecedent to capture connectivity effects, as is done, for example, with the use of the SAL-UTT feature in Ginzburg & Sag's (2000: Chapter 8) analysis of sluicing (cf. Nykiel & Kim 2021, Chapter 19 of this volume).

Kubota & Levine (2020: Chapter 8) extend Kubota & Levine's (2017) approach further to the treatment of interactions between VP ellipsis and extraction, which has often been invoked in the earlier literature (in particular, Kennedy 2003) as providing crucial evidence for covert structure analysis of ellipsis phenomena (see also Jacobson 2018 for a related proposal, cast in a variant of CCG). At least some of the counterproposals that Kubota & Levine formulate in their argument against the covert structure analysis seem to be directly compatible with the HPSG approach to ellipsis, but (so far as I am aware) no concrete analysis of extraction/ellipsis interaction currently exists in the HPSG literature.

### **4.2.4 Mismatches in right-node raising**

While right-node raising (RNR) has mostly been discussed in connection to coordination in the literature, it is well-known that RNR is not necessarily restricted to coordination environments (see, for example, Wilder 2018 for a recent overview). Moreover, it has recently been pointed out by Abeillé et al. (2016) and Shiraïshi et al. (2019) that RNR admits certain types of syntactic mismatch between the RNR'ed material and the selecting head in a non-adjacent conjunct.

### Yusuke Kubota

The current literature seems to agree that RNR is not a unitary phenomenon, and that at least some type of RNR should be treated via a mechanism of surface ellipsis, which could be modeled as deletion of syntactic (or prosodic) objects or via some sort of anaphoric mechanism (cf. Nykiel & Kim 2021: Section 6.2, Chapter 19 of this volume, Chaves 2014, Shiraïshi et al. 2019; see also Kubota & Levine 2017: footnote 15).

One point that is worth emphasizing in this connection is that while the "NCC as constituent coordination" analysis of RNR in CG discussed in Section 2.4.1 (major evidence for which comes from the interactions between various sorts of scopal operators and RNR as noted in Section 4.2.1) is well-known, neither CCG nor TLCG is by any means committed to the idea that *all* instances of RNR should be analyzed this way. In fact, given the extensive evidence for the non-unitary nature of RNR reviewed in Chaves (2014) and the syntactic mismatch data from French offered by Abeillé et al. (2016) and Shiraïshi et al. (2019), it seems that a comprehensive account of RNR in CG (or, for that matter, in any other theory) would need to recognize the non-unitary nature of the phenomenon, along lines similar to Chaves's (2014) recent proposal in HPSG. While there is currently no detailed comprehensive account of RNR along these lines in the CG literature, there does not seem to be any inherent obstacle for formulating such an account.

# **4.3 Binding**

Empirical phenomena that have traditionally been analyzed by means of Binding Theory (both in the transformational and the non-transformational literature; cf. Müller 2021a, Chapter 20 of this volume) potentially pose a major challenge to the "non-representational" view of the syntax-semantics interface common to most variants of CG. The HPSG Binding Theory in Pollard & Sag (1992; 1994: Chapter 6) captures Principles A and B at the level of argument structure, while Principle C makes reference to the configurational structure (i.e. the featurestructure encoding of the constituent geometry). The status of Principle C itself is controversial to begin with, but if this condition needs to be stated in the syntax, it would possibly constitute one of the greatest challenges to CG-based theories of syntax, since, unlike phrase structure trees, the proof trees in CG are not objects that a principle of grammar can directly refer to.

While there seems to be no consensus in the current CG literature on how the standard facts about binding theory are to be accounted for, there are some important ideas and proposals in the wider literature of CG-based syntax (broadly construed to include work in the Montague Grammar tradition). First, as for Principle A, there is a recurrent suggestion in the literature that these effects

#### 29 HPSG and Categorial Grammar

can (and should) be captured simply via strictly lexical properties of reflexive pronouns (e.g. Keenan 1988; Szabolcsi 1992; see Büring 2005: 43–44 for a concise summary). For example, for a reflexive in the direct object position of a transitive verb bound by the subject NP, the following type assignment (where the reflexive pronoun first takes a transitive verb and then the subject NP as arguments) suffices to capture its bound status:

# (48) himself; *.*() (); ((NP\S)/NP)\NP\S

This approach is attractively simple, but there are at least two things to keep in mind, in order to make it a complete analysis of Principle A in CG. First, while this lexical treatment of reflexive binding may at first sight appear to capture the locality of binding quite nicely, CG's flexible syntax potentially overgenerates unacceptable long-distance binding readings for (English) reflexives. Since RNR can take place across clause boundaries, it seems necessary to assume that hypothetical reasoning for the Lambek-slash (or a chain of Function Composition that has the same effect in CCG) can generally take place across clause boundaries. But then, expressions such as *thinks Bill hates* can be assigned the same syntactic type (i.e. (NP\S)/NP) as lexical transitive verbs, overgenerating non-local binding of a reflexive from a subject NP in the upstairs clause (\* *John thinks Bill hates himself* ).

In order to prevent this situation while still retaining the lexical analysis of reflexivization sketched above, some kind of restriction needs to be imposed as to the way in which reflexives combine with other linguistic expressions. One possibility would be to distinguish between lexical transitive verbs and derived transitive verb-like expressions by positing different "modes of composition" in the two cases in a "multi-modal" version of CG.

The other issue is that the lexical entry in (48) needs to be generalized to cover all cases in which a reflexive is bound by an argument that is higher in the obliqueness hierarchy. This amounts to positing a polymorphic lexical entry for the reflexive. The use of polymorphism is not itself a problem, since it is needed in other places in the grammar (such as coordination) anyway. But this account would amount to capturing the Principle A effects purely in terms of the specific lexical encoding for reflexive pronouns (unlike the treatment in HPSG which explicitly refers to the obliqueness hierarchy).

While Principle A effects are in essence amenable to a relatively simple lexical treatment along lines sketched above, Principle B turns out to be considerably more challenging for CG. To see this point, note that the lexical analysis of reflexives sketched above crucially relies on the fact that the constraint associ-

### Yusuke Kubota

ated with reflexives corresponds to a straightforward semantic effect of variable binding. Pronouns instead require *disjointness* of reference from less oblique coarguments, but such an effect cannot be captured by simply specifying some appropriate lambda term as the semantic translation for the pronoun.

To date, the most detailed treatment of Principle B effects in CG that explicitly addresses this difficulty is the proposal by Jacobson (2008), formulated in a version of CCG (Steedman 1996 proposes a different approach to binding, which will be briefly discussed at the end of this section). The key idea of Jacobson's account of Principle B effects is that NPs are divided by a binary-valued feature ±p, with pronouns marked NP[+p] and all other NPs NP[−p]. In all lexical entries of the form in (49), all NP (and PP) arguments in any realization of /\$ are specified as [−p].<sup>26</sup>

(49) *<sup>k</sup>*; ; VP/\$

The effect of this restriction is to rule out pronouns from argument positions of verbs with ordinary semantic denotations. On this approach, the only way a lexically specified functional category can take [+p] arguments is via the application of the following irreflexive operator:<sup>27</sup>

	- b. \* John talked to Mary about her .
	- c. \* John explained himself to him .

Cases such as the following also call for an extension (also a relatively straightforward one):

<sup>26</sup>Here, /\$ is an abbreviation of a sequence of argument categories sought via /. Thus, VP/\$ can be instantiated as VP/NP, VP/NP/NP, VP/PP/NP, etc.

<sup>27</sup>For expository purposes, I state the operator in (50) in its most restricted form, dealing with only the case where there is a single syntactic argument apart from the subject. A much broader coverage is of course necessary in order to handle cases like the following:

What is needed in effect is a schematic type specification that applies to a pronoun in any or all argument positions, i.e., stated on an input of the form VP/\$/XP[−p]/\$ to yield an output of the form VP/\$/XP[+p]/\$. To ensure the correct implementation of this extension, some version of the "wrapping" analysis needs to be assumed (cf. Jacobson 2008: 194), so that the order of the arguments in verbs' lexical entries is isomorphic to the obliqueness hierarchy (of the sort discussed by Pollard & Sag 1992).

<sup>(</sup>ii) \* John is proud of him .

By assuming (following Jacobson 2008) that the [±p] feature percolates from NPs to PPs and by generalizing the irreflexive operator still further so that it applies not just to VP/XP[−p] but to AP/XP[−p] as well, the ungrammaticality of (ii) follows straightforwardly.

29 HPSG and Categorial Grammar

# (50) φ*.*φ;  *.* () ()*,* ≠ ; (VP/NP[+p])↾(VP/NP[−p])

The greyed-in part ≠ separated from the truth conditional meaning by a comma is a presupposition introduced by the pronoun-seeking variant of the predicate. It says that the subject and object arguments are forced to pick out different objects in the model. For the semantics of pronouns themselves, one can assume, following the standard practice, that free (i.e. unbound) pronouns are simply translated as arbitrary variables (cf. Cooper 1979).

Crucially, the operator in (50) is restricted in its domain of application to the set of signs which are specified in the lexicon. I notate this restriction by using the dashed line notion in what follows. Then (51) will be derived as in (52).

(51) John praises him.

$$\begin{array}{lcl} \text{(52)} & \lambda \upmu\text{.q};\\ & \lambda f \lambda \ulu\&\nu.f(u)(v), u \neq v; & \text{pairs};\\ & \text{(VP/NP[+p])} \{\text{(VP/NP[-p])} & \text{paraise;VP/NP[-p]}\\ & \text{pairs}; \lambda u \lon.\text{pairs}(u)(v), u \neq v; \text{VP/NP[+p]} & \text{\textit{\textit{\textit{\textit{\textit{\textit{\textit{\textit{\textit{\textit{\textit{\textit{\textit{\textit{\textit{\textit{\textit{\textit{\textit{\textit{\textit{\textit{\textit{\textit{\textit{\textit{\textit{\textit{\textit{\textit{\textit{\textit{\textit{\textit{\textit{\textit{\textit{\textit{\textit{\textit{\textit{\textit{\textit{\textit{\textit{\textit{\textit{\textit{\textit{\textit{\textit{\textit{\textit{\textit{\textit{\textit{\textit{\textit{\textit{\textit{\textit{\textit{\textit{\textit{\textit{\textit{\textit{\textit{\textit{\textit{\textit{\textit{\langle>}}}}}}}}}}}}}}}}}}}}\\ \text{\textit{\textit{\textit{\textit{\textit{\cdot}}}}}}}, \text{)} & \text{pairs}; \lambda u \lon.\text{pairs}(u)(v), u \neq v; \text{VP/NP[+p]} & \text{\textit{\textit{\dot{\textit{\langle\textit{\langle\langle\mathbf{\langle\langle\mathbf{\cdot{\langle\langle\mathbf{\boldsymbol{\langle\langle\mathbf{\boldsymbol{\langle\langle\mathbf{\boldsymbol{\langle\langle\mathbf{\boldsymbol{\langle\langle\mathbf{\cdot{\langle\langle\mathbf{\boldsymbol{\langle\langle\mathbf{\boldsymbol{\langle\mathbf{\boldsymbol{\langle\mathbf{\boldsymbol{\langle\mathbf{\boldsymbol{\langle\mathbf{\boldsymbol{\langle\mathbf{\boldsymbol{\langle\mathbf{\boldsymbol{\langle\mathbf{\boldsymbol{\langle\mathbf{\boldsymbol{\langle\mathbf{\boldsymbol{\langle\mathbf{\boldsymbol{\langle\mathbf{\boldsymbol{\langle\mathbf{\boldsymbol{\langle\mathbf{\boldsymbol{\langle\mathbf{\boldsymbol{\langle\mathbf{\boldsymbol{\langle\mathbf{\boldsymbol{\langle\mathbf{\langle\mathbf{\boldsymbol{\langle\mathbf{\boldsymbol{\langle\$$

The presupposition ≠ **j** ensures that the referent of the pronoun is different from John.

Thus, Jacobson's approach captures the relevant conditions on the interpretation of pronouns essentially as a type of lexical presupposition tied to the denotation of the pronoun-taking verb, and the syntactic feature [±p] mediates the distributional correlation between the pronoun and the verb that subcategorizes for it. The idea is essentially the same as in the HPSG Binding Theory, except that the relevant condition is directly encoded as a restriction on the denotation itself, since the standard CG syntax-semantics interface does not admit of syntactic indices of the sort assumed in HPSG.

Unlike Jacobson's proposal outlined above, Steedman's (1996: Chapter 2) analysis of binding conditions in CCG recognizes the syntactic forms of the logical language that is used to write the denotations of linguistic expressions as the "level" at which binding conditions are stated. This approach can be thought of as a "compromise" which enables a straightforward encoding of the HPSG-style Binding Conditions by (slightly) deviating from the CG doctrine of not admitting any representational object at the syntax-semantics interface (see Dowty 1997 for a critique of the approach to binding by Steedman 1996 discussing this issue clearly).

### Yusuke Kubota

Steedman's approach can be best illustrated by taking a look at the analysis of (53).<sup>28</sup>

(53) \* Every student praised him .

According to Steedman, pronouns receive translations of the form **pro**(), where **pro** is effectively a term that marks the presence of (the translation of) a pronoun at some particular syntactic position in the logical formula that represents the meaning of the sentence.

With this assumption, the translation for (53) that needs to be ruled out (via Principle B) is as follows:

(54) ∀ [**student**() → **praise**(**pro**()) ()]

And this is where the CCG Binding Theory kicks in. The relevant part of the structure of the logical formula in (54) can be more perspicuously written as a tree as in Figure 2, which makes clear the hierarchical relation between sub-terms.<sup>29</sup> Principle B states that pronouns need to be locally free. Figure 2 violates this

Figure 2: Logical formula as a tree

<sup>28</sup>At the same time that he formulates an essentially syntactic account of Principle B via the term **pro** in the translation language, Steedman (1996: 29) briefly speculates on the (somewhat radical) possibility of relegating Principle B entirely to the pragmatic component of pronominal anaphora resolution. However, the relevant discussion is rather sketchy, and the details of such a pragmatic alternative are not entirely clear.

<sup>29</sup>Since binding conditions are stated at the level of the translation language, this approach raises the issue of whether it can correctly capture the binding relations in constructions in which there is a mismatch between the surface argument structure and the underlying semantics, such as in subject-to-object raising constructions (*John believes himself to be a descendant of Beethoven*). Steedman (1996) does not contain an explicit discussion on this type of data, but it seems likely that one will need to assume a particular syntax for the translation language in order to accommodate this type of data in his approach.

#### 29 HPSG and Categorial Grammar

condition since there is a locally c-commanding term that binds **pro**() (where a term binds term when they are semantically bound by the same operator).

Principles A and C are formulated similarly by making crucial reference to the structures of the terms that represent the semantic translations of sentences.

What one can see from the comparison of different approaches to binding in CG and the treatment of binding in HPSG is that although HPSG and CG are both lexicalist theories of syntax, and there is a general consensus that binding conditions are to be formulated lexically rather than configurationally, there are important differences in the actual implementations of the conditions between approaches that stick to the classical Montagovian tradition (embodying the tenet of "direct compositionality" in Jacobson's terms) and those that make use of (analogues of) representational devices more liberally.

Finally, some comments are in order regarding the status of Principle C, the part of Binding Theory that is supposed to rule out examples such as the following:

	- b. \* He talked to John's brother.

The formulation of Principle C has always been a problem in lexicalist theories of syntax. While Principles A and B can be stated by just making reference to the local argument structure of a predicate in the lexicon, the global nature of Principle C seems to require looking at the whole configurational structure of the sentence in which the referring term appears (but see Branco (2002) for an alternative view; see also Müller (2021a), Chapter 20 of this volume). In fact, Pollard & Sag (1992; 1994: Chapter 6) opt for this solution, and their definition of the Principle C has a somewhat exceptional status within the whole theory (which otherwise adheres to strict locality conditions) in directly referring to the configurational structure.

Essentially the same problem arises in CG. Steedman's (1996) formulation of Principle C can be thought of as an analog of Pollard & Sag's (1992; 1994) proposal, where global reference to hierarchical structure is made not at the level of phrase structure, but instead at the level of "logical structure", that is, in the syntactic structure of the logical language used for writing the meanings of natural language expressions. As already noted above, if one takes the Montagovian, or "direct compositional", view of the syntax-semantics interface that is more traditional/standard in CG research, this option is unavailable.

Thus, Principle C has a somewhat cumbersome place within lexicalist theories in general. However, unlike Principles A and B, the status of Principle C in the

### Yusuke Kubota

grammar is still considerably unclear and controversial to begin with (see Büring 2005: 122–124 for some discussion on this point). In particular, it has been noted in the literature (Lasnik 1986) that there are languages such as Thai and Vietnamese that do not show Principle C effects. If, as suggested by some authors (cf., e.g., Levinson 1987; 1991), the effects of Principle C can be accounted for by pragmatic principles, that would remove one major sticking point in both HPSG and CG formulations of the Binding Theory.

# **5 A brief note on processing and implementation**

The discussion above has mostly focused on linguistic analysis. In this final section, I will briefly comment on implications for psycholinguistics and computational linguistics research.

As should already be clear from the above discussion, different variants of both HPSG and CG make different assumptions about the relationship between the competence grammar and theories of performance. To make things even more complicated, such assumptions are often implicit. As a first approximation, it is probably fair to say that HPSG (at least the "bare-bones" version of it) and CCG are more similar to each other than they are to TLCG in being surfaceoriented. TLCG makes heavy use of hypothetical reasoning in the analyses of certain linguistic phenomena, and, as should already be clear at this point, the role it plays in the grammar is much like the role of movement operations in Mainstream Generative Grammar.

As repeatedly emphasized by practitioners of HPSG and CCG (see, for example, Sag & Wasow 2011, Steedman 2012: Section 13.7 and Wasow 2021, Chapter 24 of this volume), all other things being equal, it is more preferable to make the relationship between the competence grammar and the model of performance as transparent as possible. It is unlikely that any reasonable researcher would deny such a claim, but it begs one big question: how exactly are we to understand the qualification "all other things being equal"? Practitioners of TLCG in general seem to have a somewhat more detached take on the relationship between competence and performance, and I believe the consensus there is more in line with (what seems to be) the spirit of Mainstream Generative grammar: the goal is to clarify the most fundamental principles of grammar and state them in the simplest form possible. TLCG subscribes to the thesis that (a certain variety of) logic is indeed the underlying principle of grammar of natural language. This is an attractive view, but at the same time language exhibits phenomena that suggest that pushing this perspective to the limit is unlikely to be the most fruitful

#### 29 HPSG and Categorial Grammar

research strategy. The right approach is probably one that combines the insights of both surface-oriented approaches (such as HPSG and CCG) and more abstract approaches (such as TLCG and Mainstream Generative Grammar).

At a more specific level, one attractive feature of CCG (but not CG in general), when viewed as an integrated model of the competence grammar and human sentence processing, is that it enables surface-oriented, incremental analyses of strings from left to right. This aspect was emphasized in the early literature of CCG (Ades & Steedman 1982; Crain & Steedman 1985), but it does not seem to have had much impact on psycholinguistic research in general since then. A notable exception is the work by Pickering & Barry (1991; 1993) in the early 90s. There is also some work on the relationship between processing and TLCG (see Morrill 2011: Chapters 9 and 10, and references therein). In any event, a serious investigation of the relationship between competence grammar and human sentence processing from a CG perspective (either CCG or TLCG) is a research topic that is waiting to be explored, much like the situation with HPSG (see Wasow 2021, Chapter 24 of this volume).

As for connections to computational linguistics (CL)/natural language processing (NLP) research, like HPSG (cf. Bender & Emerson 2021, Chapter 25 of this volume), large-scale computational implementation has been an important research agenda for CCG (see, for example, White & Baldridge 2003; Clark & Curran 2007). I refer the reader to Steedman (2012: Chapter 13) for an excellent summary on this subject (this chapter contains a discussion of human sentence processing as well). Together with work on linguistically informed parsing in HPSG, CCG parsers seem to be attracting some renewed interest in CL/NLP research recently, due to the new trend of combining the insights of statistical approaches and linguistically-informed approaches. In particular, the straightforward syntax-semantics interface of (C)CG is an attractive feature in building CL/NLP systems that have an explicit logical representation of meaning. See, for example, Lewis & Steedman (2013) and Mineshima et al. (2016) for this type of work. TLCG research has traditionally been less directly related to CL/NLP research. But there are recent attempts at constructing large-scale treebanks (Moot 2015) and combining TLCG frameworks with more mainstream approaches in NLP research such as distributional semantics (Moot 2018).

# **6 Conclusion**

As should be clear from the above discussion, HPSG and CG share many important similarities, mainly due to the fact that they are both variants of lexicalist

### Yusuke Kubota

syntactic theories. This is particularly clear in the analyses of local dependencies in terms of lexically encoded argument structure information. Important differences emerge once one turns one's attention to less canonical types of phenomena, such as atypical types of coordination (nonconstituent coordination, Gapping) and the treatment of "constructional" patterns that are not easily lexicalizable. In general, HPSG has a richer and more comprehensive treatment of various empirical phenomena, whereas CG has a lot to offer to grammatical theory (perhaps somewhat paradoxically) due to the very fact that the potentials of the logic-based perspective it embodies has not yet been explored in full detail. It is more likely than not that the two will continue to develop as distinct theories of natural language syntax (and semantics). I hope that the discussion in the present chapter has made it clear that there are still many occasions for fruitful interactions between the two approaches both at the level of analytic ideas for specific empirical phenomena and at the more general, foundational level pertaining to the overall architecture of grammatical theory.

# **Acknowledgments**

I'd like to thank Jean-Pierre Koenig, Bob Levine and Stefan Müller for comments. This work is supported by the NINJAL collaborative research project "Crosslinguistic Studies of Japanese Prosody and Grammar".

# **References**


29 HPSG and Categorial Grammar


### Yusuke Kubota

Publications. http : / / csli - publications . stanford . edu / HPSG / 2004 / beavers sag.pdf (10 February, 2021).


29 HPSG and Categorial Grammar


### Yusuke Kubota


#### 29 HPSG and Categorial Grammar


### Yusuke Kubota

*and norms in science* (Synthese Library 260), 347–368. Dordrecht: Springer Verlag. DOI: 10.1007/978-94-017-0538-7\_21.


#### 29 HPSG and Categorial Grammar


### Yusuke Kubota

Lexical-Functional Grammar: A formal system for grammatical representation. In Mary Dalrymple, Ronald M. Kaplan, John T. Maxwell III & Annie Zaenen (eds.), *Formal issues in Lexical-Functional Grammar* (CSLI Lecture Notes 47), 29–130. Stanford, CA: CSLI Publications, 1995.


29 HPSG and Categorial Grammar


### Yusuke Kubota

Culicover & Paul M. Postal (eds.), *Parasitic gaps* (Current Studies in Linguistics 35), 181–222. Cambridge, MA: MIT Press.


#### 29 HPSG and Categorial Grammar


### Yusuke Kubota


#### 29 HPSG and Categorial Grammar


### Yusuke Kubota

*plicit models of grammar: A guide to current models*, 225–267. Oxford: Wiley-Blackwell. DOI: 10.1002/9781444395037.


29 HPSG and Categorial Grammar


### Yusuke Kubota


29 HPSG and Categorial Grammar


#### Yusuke Kubota


Zaenen, Annie. 1983. On syntactic binding. *Linguistic Inquiry* 14(3). 469–504.

# **Chapter 30**

# **HPSG and Lexical Functional Grammar**

Stephen Wechsler The University of Texas

# Ash Asudeh

University of Rochester & Carleton University

> This chapter compares two closely related grammatical frameworks, Head-Driven Phrase Structure Grammar (HPSG) and Lexical Functional Grammar (LFG). Among the similarities: both frameworks draw a lexicalist distinction between morphology and syntax, both associate certain words with lexical argument structures, both employ semantic theories based on underspecification, and both are fully explicit and computationally implemented. The two frameworks make available many of the same representational resources. Typical differences between the analyses proffered under the two frameworks can often be traced to concomitant differences of emphasis in the design orientations of their founding formulations: while HPSG's origins emphasized the formal representation of syntactic locality conditions, those of LFG emphasized the formal representation of functional equivalence classes across grammatical structures. Our comparison of the two theories includes a point by point syntactic comparison, after which we turn to an exposition of Glue Semantics, a theory of semantic composition closely associated with LFG.

# **1 Introduction**

Head-Driven Phrase Structure Grammar is similar in many respects to its sister framework, Lexical Functional Grammar or LFG (Bresnan et al. 2016; Dalrymple et al. 2019). Both HPSG and LFG are lexicalist frameworks in the sense that they distinguish between the morphological system that creates words and the syntax proper that combines those fully inflected words into phrases and sentences.

Stephen Wechsler & Ash Asudeh. 2021. HPSG and Lexical Functional Grammar. In Stefan Müller, Anne Abeillé, Robert D. Borsley & Jean- Pierre Koenig (eds.), *Head-Driven Phrase Structure Grammar: The handbook*, 1395–1446. Berlin: Language Science Press. DOI: 10.5281/zenodo.5599878

#### Stephen Wechsler & Ash Asudeh

Both frameworks assume a lexical theory of argument structure (Müller & Wechsler 2014; compare also Davis, Koenig & Wechsler 2021, Chapter 9 of this volume) in which verbs and other predicators come equipped with valence structures indicating the kinds of complements that the word is to be combined with. Both theories treat certain instances of control (equi) and raising as lexical properties of control or raising predicates (on control and raising in HPSG see Abeillé 2021, Chapter 12 of this volume). Both theories allow phonologically empty nodes in the constituent structure, although researchers in both theories tend to avoid them unless they are well-motivated (Sag & Fodor 1995; Berman 1997; Dalrymple et al. 2019: 734–742). Both frameworks use recursively embedded attribute-value matrices (AVMs). These structures directly model linguistic expressions in LFG, but are understood in HPSG as grammatical descriptions satisfied by the directed acyclic graphs that model utterances.<sup>1</sup>

There are also some differences in representational resources, especially in the original formulations of the two frameworks. But each framework now exists in many variants, and features that were originally exclusive to one framework can often now be found in some variant of the other. HPSG's valence lists are ordered, while those of LFG usually are not, butAndrews & Manning (1999) use an ordered list of terms (subject and objects) in LFG. LFG represents grammatical relations in a *functional structure* or f-structure that is autonomous from the constituent structure, while HPSG usually lacks anything like a functional structure. But in Bender's (2008) version of HPSG, the COMPS list functions very much like LFG's f-structure (see Section 10 below). This chapter explores the utility of various formal devices for the description of natural language grammars, but since those devices are not exclusively intrinsic to one framework, this discussion does not bear on the comparative utility of the frameworks themselves. For a comparison of the representational architectures and formal assumptions of the two theories, see Przepiórkowski (2021), which complements this chapter.

We start with a look at the design considerations guiding the development of the two theories, followed by point by point comparisons of syntactic issues organized by grammatical topic. Then we turn to the semantic system of LFG, beginning with a brief history by way of explaining its motivations, and continuing with a presentation of the semantic theory itself. A comparison with HPSG semantics is impractical and will not be attempted here, but see Koenig & Richter (2021), Chapter 22 of this volume for an overview of HPSG semantics.

<sup>1</sup>Both frameworks are historically related to unification grammar (Kay 1984). Unification is defined as an operation on feature structures: the result of unifying two mutually consistent feature structures is a feature structure containing all and only the information in the original two feature structures. Neither framework actually employs a unification operation, but unification can produce structures resembling the ones in use in LFG and HPSG.

30 HPSG and Lexical Functional Grammar

# **2 Design principles in the origins of HPSG and LFG**

The HPSG and LFG frameworks were originally motivated by rather different design considerations. In order to make a meaningful comparison between the frameworks, it is helpful to understand those differences.

HPSG grew out of the tradition, established by Chomsky (1957), of studying the computational properties of natural language syntax as a window onto the capabilities of human cognitive processes. The advent of Generalized Phrase Structure Grammar (Gazdar et al. 1985) was an important milestone in that tradition, bringing as it did the surprising prospect that the entire range of syntactic phenomena known to exist at the time could be described with a context-free grammar (CFG). If this could be maintained, it would mean that natural language is context-free in the sense of the Chomsky Hierarchy (Chomsky 1957), which is an answer to the question raised by Chomsky that was very different from, and more interesting than, Chomsky's own answer. Then Shieber (1985) and Culy (1985) showed that certain phenomena such as recursive cross-serial dependencies in Swiss German and Bambara exceeded the generative capacity of a context-free grammar.

Despite that development, it nonetheless appeared that languages for the most part hewed rather closely to the context-free design. Thus began the search for more powerful grammatical formalisms that would preserve the insight of a context-free grammar while allowing for certain phenomena that exceed that generative capacity.<sup>2</sup> HPSG grew out of this search. As Sag et al. (2003: 83) observe, HPSG "is still closely related to standard CFG". In fact, an HPSG grammar largely consists of constraints on local sub-trees (i.e., trees consisting of a node and its immediate daughters), which would make it a context-free grammar were it not that the nodes themselves are complex, recursively-defined feature structures. This CFG-like character of HPSG means that the framework itself has the potential to embody an interesting theory of locality. At the same time, the original theory also allowed for the description of non-local relations, and new non-local devices were added in versions of the theory developed later (a proposal to add functional structure was mentioned in Section 1, and others will be described below). The flexibility of HPSG thus provides for the study of locality and non-locality, while also allowing for grammatical description and theory construction by syntacticians with no interest in locality.

The architecture of LFG was originally motivated by rather different concerns: an interest in typological variation from a broadly functionalist perspective, from

<sup>2</sup>There were also some problems with the GPSG theory of the lexicon, in which complement selection was assimilated to the phrase structure grammar. For discussion see Müller & Wechsler (2014: Section 4.1).

#### Stephen Wechsler & Ash Asudeh

which one studies cross-linguistic variation in the expression of functionally equivalent elements of grammar. For this reason two levels of representation are discerned: a functional structure or *f-structure* representing the internal grammatical relations in a sentence that are largely invariant across languages, and a categorial constituent structure or *c-structure* representing the external morphological and syntactic expression of those relations, which vary, often rather dramatically, across different typological varieties of language. For example, probably all (or nearly all) languages have subjects and objects, hence those relations are represented in f-structure. But languages vary as to the mechanisms for signaling subjecthood and objecthood, the three main mechanisms being word order, head-marking, and dependent-marking (Nichols 1986), hence those mechanisms are distinguished at c-structure. The word *functional* in the name of the LFG framework is a three-way pun, referring to the grammatical *functions* that play such an important role in the framework, the mathematical *functions* that are the basis for the representational formalism, and the generally *functionalist*friendly nature of the LFG approach.

Despite these differing design motivations, there is no dichotomy between the frameworks with respect to the actual research undertaken within the two research communities. Typological variation within almost every area of grammar has been studied in HPSG, and locality is studied within LFG by developing theories of the mapping between c-structure and f-structure (see Bresnan et al. 2016: 88–128). In the remainder of this chapter we will survey various phenomena and compare HPSG and LFG approaches.

# **3 Phrases and endocentricity**

A phrasal node shares certain grammatical features with specific daughters. In HPSG, this is accomplished by means of structure-sharing (reentrancies) in the immediate dominance schemata and other constraints on local sub-trees such as the Head Feature Principle. LFG employs essentially the same mechanism for feature sharing in a local sub-tree but implements it slightly differently, so as to better address the design motivations of the theory. Each node in a phrase structure is paired with an f-structure, which is formally a set of attribute-value pairs. It is through the f-structure that the nodes of the phrase structure share features. The phrase structure is referred to as *c-structure*, for categorial or constituent structure, in order to distinguish it from f-structure. Context-free phrase structure rules license c-structures, and the c-structure elements are annotated with *functional equations* which describe the corresponding f-structure. The correspondence function from c-structures to f-structures, , defines and constrains the f-structure on the basis of the equations collected from the c-structure anno-

#### 30 HPSG and Lexical Functional Grammar

tations and from lexical entries of the terminal nodes.<sup>3</sup> For example, the phrase structure grammar in (1) and lexicon in (2) license the tree in Figure 1. 4*,*5

$$\begin{array}{ccccc} \text{(1)} & \text{a. } \text{S} & \rightarrow & \text{NP} & \text{VP} \\ & & & \text{(\uparrow\\_sub)} = \downarrow & \uparrow = \downarrow \\ & \text{b. } \text{NP} & \rightarrow & \begin{pmatrix} \text{Det} \\ \uparrow = \downarrow \end{pmatrix} & \uparrow = \downarrow \\ & \text{c. } \text{VP} & \rightarrow & \text{V} \\ & & \uparrow = \downarrow & \binom{\uparrow}{\uparrow} \text{onj} \langle \rangle = \downarrow \end{array}$$
 
$$\begin{array}{ccccc} \text{(2)} & \text{a. } \text{this:} \text{Det} & \quad \text{(\uparrow\\_\text{PRO})} = \downarrow \\ & \text{(\uparrow\\_\text{NUM})} = \text{sc} \\ & \text{b. } \text{ lion: } \text{N} & \quad \text{(\uparrow\\_\text{PRED}) = \langle \text{lion'} \\ & \text{(\uparrow\\_\text{PRED}) = \langle \text{so} \\ & \text{c. } \text{ roor: V} & \quad \langle \uparrow \text{PRED} \rangle = \text{'roor} \\ & \text{-s: } \text{infl} & \quad \langle \uparrow \text{TENSE} \rangle = \text{PRS} \\ & \text{(\uparrow\\_\text{SUB})} = \downarrow \\ & \text{(\downarrow\\_\text{PERS}) = 3 \\ & \text{(\downarrow\\_\text{NUM})} = \text{sg} \end{array}$$

Each node in the c-structure maps to an f-structure, that is, to a set of attributevalue pairs. Within the equations, the up and down arrows are metavariables over f-structure labels, interpreted as follows: the up arrow refers to the f-structure to which the c-structure mother node maps, and the down arrow refers to the f-structure that its own c-structure node maps to. To derive the f-structure from Figure 1, we instantiate the metavariables to specific function names and solve for the f-structure associated with the root node (here, S). In Figure 2, the f-structure labels <sup>1</sup> , 2, etc. are subscripted to the node labels. The arrows have been replaced with those labels.

Collecting all the equations from this tree and solving for <sup>1</sup> , we arrive at the f-structure in (3):

<sup>3</sup>Taken together, the set of equations in a c-structure is called the functional description or *f-description*.

<sup>4</sup> In (2c) the verb is broken down into the root *roar* and third person singular suffix *-s* to show the functional equations contributed by each morpheme. However, LFG does not require that words be analyzed into morphemes. It is compatible with morpheme-based (e.g., Ishikawa 1985; Bresnan et al. 2016: 384–385, 395–396) or realizational morphology (Dalrymple et al. 2019: Chapter 12) or any other theory of morphology that associates word forms with grammatical features.

<sup>5</sup>For simplicity's sake, in this basic example we assume an NP analysis rather than a DP one (Brame 1982). However, much recent LFG work assumes DP; see Bresnan et al. (2016) and Dalrymple et al. (2019) for further discussion.

#### 30 HPSG and Lexical Functional Grammar

$$\begin{array}{c} \text{(3)} \quad f\_{\text{i}} \begin{bmatrix} \text{PRED \\_\text{'Iion'}}\\ \text{SUBJ} \\ \text{PERS 3}\\ \text{PERS} + \\ \text{PRED} \end{bmatrix} \end{array} \Bigg| \begin{array}{c} \text{(PRED \\_\text{'Iion'})}\\ \text{PERS} \end{array} \right] \Bigg| \begin{array}{c} \text{(}\text{'Ion'} \text{'Ion'} \text{'Ion'} \text{'Ion'} \text{'Ion'} \\ \text{PRED} \quad \text{'Ion'} \text{'Ion'} \text{'Ion'} \text{'Ion'} \\ \text{TENSE \\_\text{'PRES}} \end{array} \Bigg) \Bigg| \begin{array}{c} \text{(}\text{'Ion'} \text{'Ion'} \text{'Ion'} \text{'Ion'} \text{'Ion'} \text{'Ion'} \text{'Ion'} \text{'Ion'} \text{'Ion'} \text{'Ion'} \text{'Ion'} \text{'Ion'} \text{'S'} \text{'S'} \\ \text{TENSE \\_\text{'PRES}} \end{array} \Bigg) \end{array} \Bigg) \Bigg) \begin{array}{c} \text{(}\text{'Ion'} \text{'Ion'} \text{'Ion'} \text{'Ion'} \text{'Ion'} \text{'Ion'} \text{'Ion'} \text{'Ion'} \text{'S'} \text{'S'} \text{'S'} \text{'S'} \text{'S'} \text{'S'} \text{'S'} \text{'S'} \text{'S'} \text{'S'} \text{'S'} \end{array} \right) \Bigg) \begin{array}{c} \text{(}\text{'Ion'} \text{'Ion'} \text{'Ion'} \text{'S'} \text{'S'} \text{'S'} \text{'S'} \text{'S'} \text{'S'} \text{'S'}$$

F-structures are subject to three general well-formedness conditions/principles:<sup>6</sup>


Completeness and Coherence together play a similar role to the Valence Principle in HPSG (Pollard & Sag 1994: 348) in that they guarantee that only exactly the right number of syntactic dependents are in the structure. Uniqueness means that an f-structure has to be consistent in its feature values<sup>7</sup> and also means that the f-structure is a function in the mathematical sense, a set of ordered pairs such that no two pairs have the same first member. Moreover, each PRED value is assumed to bear a unique index (normally suppressed, as we do here), so that even two instances of apparently the "same" PRED value cannot be identical; this plays an important role in LFG's analysis of pronominal affixes and agreement (see Section 7).

Since the up and down arrows refer to nodes of the local sub-tree, LFG annotated phrase structure rules like those in (1) can often be directly translated into HPSG immediate dominance schemata and principles constraining local sub-trees. By way of illustration, let FS (for *f-structure*) be an HPSG attribute corresponding to the f-structure projection function. Then the LFG rule in (7a) (repeated from (1a) above) is equivalent to the HPSG rule in (7b):

$$\begin{array}{ccccc} \text{(7)} & \text{a. } \text{S} & \rightarrow & \text{NP} & \text{VP} \\ & & & \text{(\uparrow} \text{ sUB} \text{j}) = \downarrow & \uparrow = \downarrow \\ \text{b. } & \text{S[\_{Fs}[\square]} & \rightarrow & \text{NP[\_{Fs}[\square]]} & \text{VP[\_{Fs}[\square][\text{sUB} \text{j}]]} \\ \end{array}$$

Let us compare the two representations with respect to heads and dependents.

Taking heads first, the VP node annotated with ↑ = ↓ is an *f-structure head*, meaning that the features of the VP are identified with those of the mother S. This

<sup>6</sup>Here we state them informally, just to capture the key intuitions, but see Kaplan & Bresnan (1982: 37 (reprint pagination)) or Dalrymple et al. (2019: 52–53) for precise definitions.

<sup>7</sup> In fact, another name for the uniqueness principle is *Consistency* (Dalrymple et al. 2019: 53).

#### Stephen Wechsler & Ash Asudeh

effect is equivalent to the tag <sup>1</sup> in (7b). Hence ↑ = ↓ has an effect similar to HPSG's Head Feature Principle. However, in LFG the part of speech categories and their projections such as N, V, Det, NP, VP, DP, etc. belong to the c-structure and not the f-structure. As a consequence, those features are not subject to sharing, and any principled correlations between such categories, such as the fact that N is the head of NP, V the head of VP, C the head of CP, and so on, are instead captured in an explicit version of (extended) X-bar theory applying exclusively to the c-structure (Grimshaw 1998). The version of extended X-bar theory in Bresnan et al. (2016: Chapter 6) assumes that all nodes on the right side of the arrow of the phrase structure rule are optional, with many unacceptable partial structures ruled out in the f-structure instead. Also, not all structures need to be endocentric (i.e., not all structures have a head daughter in c-structure). The LFG category S shown in (7a) is inherently exocentric, lacking a c-structure head whose c-structure category could influence its external syntax (the f-structure head of S is the daughter with the ↑ = ↓ annotation, here the VP). (English is also assumed to have endocentric clauses of category IP, where an auxiliary verb of category I (for Inflection) serves as the c-structure head.) S is used for copulaless clauses and also for the flat structures of nonconfigurational clauses in languages such as Warlpiri (see Section 10 below).

Functional projections like DP, IP, and CP are typically assumed to form a "shell" over the lexical projections NP, VP, AP, and PP (plus CP can appear over S). While this assumption is widespread in transformational approaches, its origins can be found in non-transformational research, including early LFG: CP was proposed in Fassi-Fehri's (1981: 141) LFG treatment of Arabic, and IP (the idea of the sentence as functional projection) is found in Falk's (1983) LFG analysis of the English auxiliary system (Falk called it "MP" instead of "IP").<sup>8</sup>

Extended projections are formally implemented in LFG by having the functional head (such as D) and its lexical complement (such as NP) be f-structure co-heads. See for example the DP *this lion* in Figure 3, where D, NP, and N are all annotated with ↑ = ↓, hence the DP, D, NP, and N nodes all map to the same f-structure. What makes this identity possible is that function words lack a PRED feature that would otherwise indicate a semantic form.<sup>9</sup> Content words such as *lion* have such a feature ([PRED 'lion']), and so if the D had one as well, then they would clash in the f-structure. Note more generally that the f-structure flattens out much of the hierarchical structure of the corresponding c-structure.

<sup>8</sup>For more on the origins of extended projections see Bresnan et al. (2016: 124–125).

<sup>9</sup>The attribute PRED ostensibly stands for "predicate", but really it means something more like "has semantic content", as there are lexical items, such as proper names, which have PRED features but are not predicates under standard assumptions.

#### 30 HPSG and Lexical Functional Grammar

Figure 3: Functional heads in LFG

Complementation works a little differently in LFG from HPSG. Note that the LFG rule (7a) indicates the SUBJ grammatical function on the subject NP node, while the pseudo-HPSG rule (7b) indicates the SUBJ function on the VP functor selecting the subject. A consequence of the use of functional equations in LFG is that a grammatical relation such as SUBJ can be locally associated with its formal exponents, whether a configurational position in phrase structure (as in Figure 1), head-marking (agreement, see (2c)), or dependent marking (case). A nominative case affix specialized for exclusively marking subjects can introduce a so-called "inside-out" functional designator, (SUBJ ↑), which requires that the f-structure of the NP or DP bearing that case ending be the value of a SUBJ attribute (Nordlinger 1998).<sup>10</sup> Other argument cases effectively resolve an annotation on the NP or DP node to the appropriate grammatical function. In all of these situations, the attribute encoding a grammatical function, such as SUBJ or OBJ, is directly associated with an element filling that function. This aspect of LFG representations makes it convenient for functionalist and typological work on grammatical relations.

<sup>10</sup>Inside-out designators can be identified by their special syntax: the attribute symbol (here SUBJ) precedes instead of following the function symbol (here the metavariable ↑). They are defined as follows: for function *f*, attribute *a*, and value *v*, () = iff () = . If *f* is the f-structure representing a nominal in nominative case, then (SUBJ *f* ) refers to the f-structure of the clause whose subject is that nominal.

Stephen Wechsler & Ash Asudeh

# **4 Grammatical functions and valence**

*Grammatical functions* (or *grammatical relations*) like subject and object play an important role in LFG theory, and it is worthwhile to compare the status of grammatical functions in LFG and HPSG. Grammatical functions in LFG are best understood in the context of the break with Transformational Grammar that led to LFG and other alternative frameworks. Chomsky (1965: 68–74) argued that while grammatical functions clearly play a role in grammar, they need not be explicitly incorporated into the grammar as such. Instead he proposed to define them in phrase structural terms: the "subject" is the NP immediately dominated by S, the "object" is the NP immediately dominated by VP, and so on. This theoretical assumption necessitated the use of transformations and a profusion of certain null elements: an affixal subject, for example, would have to be inserted under an NP node that is the daughter of S, and then moved to its surface position as affix; a subject that is phonologically null but anaphorically active would have to be generated in that position as well, hence the need for "null *pro*". Early alternatives to Transformational Grammar such as Relational Grammar (Perlmutter 1983) and LFG also sought to capture the equivalence across different alternative expressions of a single argument, but rejected the transformational model. Instead the grammar licenses an abstract representation of the "subject", for example, essentially as a set of features. The language-specific grammar maps a predicator's semantic roles to these abstract representations (named "subject", "object", and so on), and also maps those representations onto the expressions in the language.

As a non-transformational theory, HPSG breaks down the relation between semantic roles and their grammatical expressions into two distinct mappings, in roughly the same way that LFG does. The intermediate representations of arguments consists of sets of features, just like in LFG. However, while LFG provides a consistent cross-linguistic representation of each grammatical function type, many variants of the HPSG framework allow the representation to vary from language to language. For example, LFG identifies subjects in every language with the attribute SUBJ, while HPSG identifies the subject, if at all, as a specific element of the ARG-ST list, but which element it is can vary with the language. An English subject is usually assumed to be the first element in the list, or ARG-ST|FIRST, while the definition of a German subject necessarily involves the CASE feature (Reis 1982). Note that it would not work to represent German subjects as ARG-ST|FIRST, because in German subjectless sentences with arguments (such as datives), the first item in the ARG-ST list is a non-subject. Also, the HPSG valence feature SUBJ (or SPR "specifier") does not represent the subject grammatical

#### 30 HPSG and Lexical Functional Grammar

function in the LFG sense, but is instead closer to Chomsky's (1965: 68–74) definition of subject as an NP in a particular phrase structural position. However, nothing precludes an HPSG practitioner from adopting LFG-style grammatical function features within HPSG, and many have done so, such as the qVAL feature of Kropp Dakubu et al. (2007) or the ERG feature proposed by Pollard (1994). Similarly, Andrews & Manning (1999) present a version of LFG that incorporates an HPSG-style valence list.

This illustrates once again LFG's orientation towards cross-linguistic grammar comparison. HPSG practitioners who do not adopt an LFG-style SUBJ feature can still formulate theories of "subjects", comparing subjects in English, German, Icelandic, and so on, but they lack a formal correlate of the notion "subject" comparable to LFG's SUBJ feature. Meanwhile from the perspective of a single grammar, the original motivation for adopting grammatical functions, in reaction to the pan-phrase-structural view of the transformational approach, informed both LFG and HPSG in the same way.

In LFG, a lexical predicator such as a verb selects its complements via grammatical functions, which are native to f-structure, rather than using c-structure categories. A transitive verb selects a SUBJ and OBJ, which are features of f-structure, but it cannot select for the category DP because such part of speech categories belong only to c-structure. For example, the verb stem *roar* in (2c) has a PRED feature whose value contains (↑ SUBJ), which has the effect of requiring a SUBJ function in the f-structure. The f-structure (shown in (3)) is built using the defining equations, as described above. Then that f-structure is checked against any *existential constraints* such as the expression (↑ SUBJ), which requires that the f-structure contain a SUBJ feature. That constraint is satisfied, as shown in (3). Moreover, the fact that (↑ SUBJ) appears in the angled brackets means that it expresses a semantic role of the 'roar' relation, hence the SUBJ value is required to contain a PRED feature, which is satisfied by the feature [PRED 'lion'] in (3).

Selection for grammatical relations instead of formal categories enables LFG to capture the flexibility in the expression of a given grammatical relation described at the end of the previous section. As noted there, in many languages the subject can be expressed either as an independent NP/DP phrase as in English, or as a pronominal affix on the verb. As long as the affix introduces a PRED feature and is designated by the grammar as filling the SUBJ relation, then it satisfies the subcategorization requirements imposed by a verb. A more subtle example of flexible expression of grammatical functions can be seen in English constructions where an argument can in principle take the form of either a DP (as in (8a)) or a clause (as in (8b)) (the examples in (8) are from Bresnan et al. 2016: 11–12).

#### Stephen Wechsler & Ash Asudeh

	- b. That he was sick, we talked about for days.
	- c. We talked about that problem for days.
	- d. \* We talked about that he was sick for days.

The preposition *about* selects neither a DP nor a CP *per se*, but rather selects the grammatical function OBJ.

(9) *about*: P (↑ PRED) = 'abouth(↑ OBJ)i'

It is not the preposition but the local c-structure environment that conditions the category of that argument: the canonical prepositional object position rightadjacent to *about* can only house a DP (compare (8c) and (8d)), while the topic position allows either DP or CP (compare (8a) and (8b)). In LFG, the grammatical functions such as SUBJ and OBJ represent equivalence classes across various modes of c-structure expression.

Some HPSG approaches to filler-gap mismatches are presented in Borsley & Crysmann (2021: Section 9), Chapter 13 of this volume. They are essentially similar to the LFG account just presented, in that they work by allowing the preposed clause and the gap to differ in certain features. The main difference between research on this problem in the two frameworks is that in LFG the bifurcation between matching and non-matching features of filler/gap is built into the framework in the separation between f-structure and c-structure.<sup>11</sup>

# **5 Head mobility**

The lexical head of a phrase can sometimes appear in an alternative position apparently outside of what would normally be its phrasal projection. Assuming that an English finite auxiliary verb is the (category I) head of its (IP) clause, then that auxiliary appears outside its clause in a yes/no question:

(10) a. [IP she is mad]

b. Is [IP she \_ mad]?

Transformational grammars capture the systematic relation between these two structures with a head-movement transformation that leaves the source IP structure intact, with a trace replacing the moved lexical head of the clause. The

<sup>11</sup>Note that c-structure and f-structure are autonomous but not independent. To constrain the relation between the c-structure category and f-structure, one can use either metacategories (Dalrymple et al. 2019: 691–698) or the CAT function (Dalrymple et al. 2019: 265).

#### 30 HPSG and Lexical Functional Grammar

landing site of the moved clausal head is often assumed to be C, the complementizer position, as motivated by complementarity between the fronted verb and a lexical complementizer. This complementarity is observed most strikingly in German verb-second versus verb-final alternations, but is also found in other languages, including some English constructions such as the following:

	- b. I wonder, is [IP she \_ mad]?
	- c. \* I wonder whether is she mad.

Non-transformational frameworks like HPSG and LFG offer two alternative approaches to head mobility, described in the HPSG context in Müller (2021: Section 5), Chapter 10 of this volume. Let us consider these in turn.

In the constructional approach, the sentences in (11) have been treated as displaying two distinct structures licensed by the grammar (Sag et al. 2020 and other references in Müller 2021: Section 5, Chapter 10 of this volume). For example, assuming ternary branching for the sentence in (10b), then the subject DP *she* and predicate AP *mad* would normally be assumed to be sisters of the fronted auxiliary *is*. On that analysis, the phrase structure is flattened out so that *she mad* is not a constituent. In fact, for English the fronting of *is* can even be seen as a consequence of that flattening: English is a head-initial language, so the two dependents*she* and *mad* are expected to follow their selecting head *is*. This analysis is common in HPSG, and it could be cast within the LFG framework as well.

The second approach is closer in spirit to the head movement posited in Transformational Grammar. It is found in both HPSG and LFG, but the formal implementation is rather different in the two frameworks. The HPSG "head movement" account due to Borsley (1989) posits a phonologically empty element that can function as the verb in its canonical position (such as (10a)), and that empty element is structure-shared with the verb that is fronted. This treats the variation in head position similarly to non-local dependencies involving phrases. See Müller (2021: Section 5.1), Chapter 10 of this volume.

The LFG version of "head movement" takes advantage of the separation of autonomous c- and f-structures. Recall from the above discussion of the DP in Figure 3 that functional heads such as determiners, auxiliaries, and complementizers do not introduce new f-structures, but rather map to the same f-structure as their complement phrases. The finite auxiliary can therefore appear in either the I or C position without this difference in position affecting the f-structure, as we will see presently. Recall also that c-structure nodes are optional and can be omitted as long as a well-formed f-structure is generated. Comparing the non-

terminal structures of Figure 4 and Figure 5, the I preterminal node is omitted from the latter structure, but otherwise they are identical. (The lexical equations in Figure 5 are the same as the ones in Figure 4 but are omitted for clarity.) Given the many ↑ = ↓ annotations, the C, I, and AP nodes (as well as IP and CP) all map to the same f-structure, namely the one shown in (12).

$$\begin{array}{c} \text{(12)} \quad f \begin{bmatrix} \text{PRED \\_} \text{pro} \\ \text{SUBJ} \\ \text{PERS \ 3} \\ \text{GEND FEM} \end{bmatrix} \end{array} \Bigg| \begin{array}{c} \text{(12)} \quad f \end{array} \right]$$

$$\begin{array}{c} \text{(12)} \quad \text{(12)} \quad f \text{(12)} \\ \begin{bmatrix} \text{PRED \\_} \text{mod} \langle \langle f \text{ \\_sup} \rangle \rangle \\ \text{FIN} \end{bmatrix} \end{array}$$

The C and I positions are appropriate for markers of clausal grammatical features such as finiteness ([FIN ±]), encoded either by auxiliary verbs like finite *is* or complementizers like finite *that* and infinitival *for*: *I said that/\*for she is present* vs. *I asked for/\*that her to be present*. English has a specialized class of auxiliary verbs

#### 30 HPSG and Lexical Functional Grammar

Figure 5: Head mobility in LFG

for marking finiteness from the C position, while in languages like German all finite verbs, including main verbs, can appear in a C position that is unoccupied by a lexical complementizer. Summarizing, the LFG framework enables a theory of head mobility based on the intuition that a clause has multiple head positions where inflectional features of the clause are encoded.

# **6 Agreement, case, and constraining equations**

The basic theory of agreement is the same in LFG and HPSG (see Wechsler 2021, Chapter 6 of this volume): agreement occurs when multiple feature sets arising from distinct elements of a sentence specify information about a single abstract object, so that the information must be mutually consistent (Kay 1984). The two forms are said to agree when the values imposed by the two constraints are compatible, while ungrammaticality results when they are incompatible. An LFG example is seen in Figure 1 above, where the noun, determiner, and verbal suffix each specify person and/or number features of the same SUBJ value.

The basic mechanism for case marking works in essentially the same way as agreement in both frameworks: in case marking, distinct elements of a sentence specify case information about a single abstract object, hence that information must be compatible. To account for the contrast in (13a), nominative CASE equa-

#### Stephen Wechsler & Ash Asudeh

tions are associated with the pronoun *she* and added to the entry for the verbal agreement suffix *-s*:

### (13) a. She/\*Her/\*You rules.

b. *she*: D (↑ PRED) = 'pro' (↑ CASE) = NOM (↑ PERS) = 3 (↑ NUM) = SG (↑ GEND) = FEM c. *her*: D (↑ PRED) = 'pro' (↑ CASE) = ACC (↑ PERS) = 3 (↑ NUM) = SG (↑ GEND) = FEM d. *you*: D (↑ PRED) = 'pro' (↑ PERS) = 2 e. -*s*: *infl* (↑ TENSE) = PRES (↑ SUBJ) = ↓ (↓ PERS) = 3 (↓ NUM) = SG (↓ CASE) = NOM

The variant of (13a) with *her* as subject is ruled out due to a clash of CASE within the SUBJ f-structure. The variant with *you* as subject is ruled out due to a clash of PERS features. This mechanism is essentially the same as in HPSG, where it operates via the valence features.

This account allows for underspecification of both the case assigner and the case-bearing element, and of both the trigger and target of agreement. In English, for example, gender is marked on some pronouns but not on a verbal affix, and (nominative) case is not marked on nominals, with the exception of pronouns, but is governed by the finite verb. But certain case and agreement phenomena do not tolerate underspecification, and for those phenomena LFG offers an account using a *constraining equation*, a mechanism absent from HPSG and indeed ruled out by current foundational assumptions of HPSG theory (Richter 2021, Chapter 3 of this volume). (Some early precursors to HPSG included a special feature value called ANY that functioned much like an LFG constraining equation, e.g. Shieber 1986: 36–37, but that device has been eliminated from HPSG.) The functional equations described so far in this chapter work by building the f-structure,

#### 30 HPSG and Lexical Functional Grammar

as illustrated in Figure 1 and (3) above; such equations are called *defining equations*. A constraining equation has the same syntax as a defining equation, but it functions by checking the completed f-structure for the presence of a feature. An f-structure lacking the feature designated by the constraining equation is illformed.

The following lexical entry for *she* is identical to the one in (13b) above, except that the CASE equation has been replaced with a constraining equation, notated with = .

(14) *she*: D (↑ PRED) = 'pro' (↑ CASE) = NOM (↑ PERS) = 3 (↑ NUM) = SG (↑ GEND) = FEM

The f-structure is built from the defining equations, after which the SUBJ field is checked for the presence of the [CASE NOM] feature, as indicated by the constraining equation. If this feature has been contributed by the finite verb, as in (13), then the sentence is predicted to be grammatical; if there is no finite verb (and there is no other source of nominative case) then it is ruled out. This predicts the following grammaticality pattern:

	- a. She did! / \*Her did!
	- b. \* She! / Her!

English nominative pronouns require the presence of a finite verb, here the finite auxiliary *did*. Constraining equations operate as output filters on f-structures and are the primary way to grammatically specify the obligatoriness of a form, especially under the assumption that all daughter nodes are optional in the phrase structure. As described in Section 4 above, obligatory dependents are specified in the lexical form of a predicator using existential constraints like (↑ SUBJ) or (↑ OBJ). These are equivalent to constraining equations in which the particular value is unspecified, but some value must appear in order for the f-structure to be well-formed.

A constraining equation for case introduced by the case-*assigner*, rather than the case-bearing element, predicts that the appropriate case-bearing element must appear. A striking example from Serbo-Croatian is described by Wechsler & Zlatić (2003: 134), who give this descriptive generalization:

#### Stephen Wechsler & Ash Asudeh

(16) Serbo-Croatian Dative/Instrumental Case Realization Condition: If a verb or noun assigns dative or instrumental case to an NP, then that case must be morphologically realized by some element within the NP.

In Serbo-Croatian most common nouns, proper nouns, adjectives, and determiners are inflected for case. An NP in a dative position must contain at least one such item morphologically inflected for dative case, and similarly for instrumental case. The verb *pokloniti* 'give' governs a dative object, such as *ovom studentu* in (17a). But a quantified NP like *ovih pet studenata* 'these five students' has invariant case, namely genitive on the determiner and noun, and an undeclinable numeral *pet* 'five'. Such a quantified NP can appear in any case position, except when it fails to satisfy the condition in (16), such as this dative position (Wechsler & Zlatić 2003: 125):

	- b. \* pokloniti give.INF knjige books.ACC [ovih this.GEN.PL pet five studenata] student.GEN.PL intended: 'to give books to these five students'

Similarly, certain foreign names such as *Miki* and loanwords such as *braon* 'brown', 'brunette' are undeclinable, and can appear in any case position, except those ruled out by (16). Thus example (18a) is unacceptable, while the inflected possessive adjective *mojoj* 'my' saves it, as shown in (18b). When the possessive adjective realizes the case feature, it is acceptable. In (18c) we contrast the undeclined loan word *braon* 'brown' with the inflected form *lepoj* 'beautiful'. The example is acceptable only with the inflected adjective (Wechsler & Zlatić 2003: 134).


c. Divim admire.1SG se REFL \*braon brown / lepoj beautiful.DAT.SG Miki. Miki intended: 'I admire brunette / beautiful Miki.'

#### 30 HPSG and Lexical Functional Grammar

This complex distribution is captured simply by positing that the dative (and instrumental) case assigning equations on verbs and nouns, such as the verbs *pokloniti* and *divim* in the above examples, are constraining equations:

(19) (↑ OBL CASE) = DAT

Any item in dative form within the NP, such as *ovom* or *studentu* in (17a) or *mojoj* or *lepoj* in (18b,c), could introduce the [CASE DAT] feature that satisfies this equation, but if none appears then the sentence fails. In contrast, other case-assigning equations (e.g., for nominative, accusative, or genitive case, or for cases assigned by prepositions) are defining equations, which therefore allow the undeclined NPs to appear. This sort of phenomenon is easy to capture using an output filter such as a constraining equation, but rather difficult otherwise. See Wechsler & Zlatić (2001) for further examples and discussion.

# **7 Agreement and affixal pronouns**

Agreement inflections that include the person feature derive historically from incorporated pronominal affixes. Distinguishing between agreement markers and affixal pronouns can be a subtle and controversial matter. LFG provides a particular formal device for representing this distinction within the f-structure: a true pronoun, whether affixal or free, introduces a semantic form (formally a PRED feature) with the value 'pro', while an agreement inflection does not. For example, Bresnan & Mchombo (1987) argue that the Chicheŵa (Bantu) object marker (OM) is an incorporated pronoun, while the subject marker (SM) alternates between agreement and incorporated pronoun, as in this example:

(20) Njûchi 10.bee zi-ná-wá-lum-a 10.SM-PST-2.OM-bite-FV a-lenje. 2-hunter (Chicheŵa) 'The bees bit them, the hunters.'

According to Bresnan & Mchombo (1987: 745), the class 2 object marker *wá-* is a pronoun, so the phrase *alenje* 'the hunters' is not the true object, but rather a postposed topic cataphorically linked to the object marker, with which it agrees in noun class (a case of *anaphoric agreement*). Meanwhile, the class 10 subject marker *zi-* alternates: when an associated subject NP (*njûchi* 'bees' in (20)) appears, then it is a grammatical agreement marker, but when no subject NP appears, then it functions as a pronoun. This is captured in LFG with the simplified lexical entries in (21):

#### Stephen Wechsler & Ash Asudeh


The PRED feature in (21b) is obligatory while that of (21c) is optional, as indicated by the parentheses around the latter. These entries interact with the grammar in the following manner. The two grammatical functions governed by the verb in (21a) are SUBJ and OBJ (the *governed* functions are the ones designated in the predicate argument structure of a predicator). According to the Principle of Completeness, a PRED feature must appear in the f-structure for each governed grammatical function that appears within the angle brackets of a predicator (indicating assignment of a semantic role). By the uniqueness condition it follows that there must be *exactly one* PRED feature, since a second such feature would cause a clash of values.<sup>12</sup>

The OM *wá-* introduces the [PRED 'pro'] into the object field of the f-structure of this sentence; the word *alenje* 'hunters' introduces its own PRED feature with value 'hunter', so it cannot be the true object, and instead is assumed to be in a topic position. Bresnan & Mchombo (1987: 744–745) note that the OM can be omitted from the sentence, in which case the phrasal object (here, *alenje*) is fixed in the immediately post-verbal position, while that phrase can alternatively be preposed when the OM appears. This is explained by assuming that the postverbal position is an OBJ position, while the adjoined TOPIC position is more flexible.

The subject *njûchi* can be omitted from (20), yielding a grammatical sentence meaning 'They (some class 10 plural entity) bit them, the hunters.' The optional PRED feature equation in (21c) captures this pro-drop property: when the equation appears, then a phrase such as *njûchi* cannot appear in the subject position, since this would lead to a clash of PRED values ('pro' versus 'bee'); but when the equation is not selected, then *njûchi* must appear in the subject position in order for the f-structure to be complete.

The diachronic process in which a pronominal affix is reanalyzed as agreement has been modeled in LFG as the historic loss of the PRED feature, along with the retention of the pronoun's person, number, and gender features (Coppock & Wechsler 2010). The anaphoric agreement of the older pronoun with its antecedent then becomes reanalyzed as grammatical agreement of the inflected verb with an external nominal. Finer transition states can also be modeled in terms of selective feature loss. Clitic doubling can be modeled as optional loss

<sup>12</sup>Recall that each PRED value is assumed to be unique, so that two 'pro' values cannot unify.

#### 30 HPSG and Lexical Functional Grammar

of the PRED feature with retention of some semantic vestiges of the pronominal, such as specificity of reference.

The LFG analysis of affixal pronouns and agreement inflections can be translated into HPSG by associating the former but not the latter with pronominal semantics in the CONTENT field. The complementarity between the appearance of a pronominal inflection and an analytic phrase filling the same grammatical relation would be modeled as an exclusive disjunction between those two options, which captures the effects of the uniqueness of PRED values in LFG.

# **8 Lexical mapping**

LFG and HPSG both adopt *lexical approaches to argument structure* in the sense of Müller & Wechsler (2014): a verb or other predicator is equipped with a valence structure indicating the grammatical expression of its semantic arguments as syntactic dependents. Both frameworks have complex systems for mapping semantic arguments to syntactic dependents that are designed to capture prevailing semantic regularities within a language and across languages. The respective systems differ greatly in their notation and formal properties, but it is unclear whether there are any theoretically interesting differences, such as types of analysis that are available in one but not the other. This section identifies some of the most important analogues across the two systems, namely LFG's Lexical Mapping Theory (LMT; Bresnan et al. 2016: Chapter 14) and the theory of macroroles proposed by Davis and Koenig for HPSG (Davis 1996; Davis & Koenig 2000; see Davis, Koenig & Wechsler 2021: Section 4, Chapter 9 of this volume).<sup>13</sup>

In LMT, the argument structure is a list of a verb's argument slots, each labeled with a thematic role type such as Agent, Instrument, Recipient, Patient, Location, and so on, in the tradition of Charles Fillmore's *Deep Cases* (Fillmore 1968; 1977) and Pāṇini's *kārakas* (Kiparsky & Staal 1969). The ordering is determined by a thematic hierarchy that reflects priority for subject selection.<sup>14</sup> The thematic role type influences a further classification by the features [±] and [±] that

<sup>13</sup>A recent alternative to LMT based on Glue Semantics has been developed by Asudeh & Giorgolo (2012) and Asudeh et al. (2014), incorporating the formal mapping theory of Findlay (2016), which in turn is based on Kibort (2007). It preserves the monotonicity of LMT, discussed below, but uses Glue Semantics to model argument realization and valence alternations. Müller (2018) discusses this Glue approach in contrast to HPSG approaches in light of lexicalism and argument structure. Note, though, that despite what might be implied by the title of the Müller volume, the Asudeh et al. treatment is not *necessarily* "phrasal" (i.e., non-lexicalist). The Glue framework assumed by Asudeh et al. can accommodate either a lexicalist or non-lexicalist position. It is a theoretical matter as to which is correct.

<sup>14</sup>The particular ordering proposed in Bresnan & Kanerva 1989: 23 and Bresnan et al. 2016: 329 is the following: *agent beneficiary experiencer/goal instrument patient/theme locative*.

#### Stephen Wechsler & Ash Asudeh

conditions the mapping to syntactic functions (this version is from Bresnan et al. 2016: 331):

(22) Semantic classification of argument structure roles for function: patientlike roles:

[−] secondary patientlike roles: [+] other semantic roles: [−]

The features [±] (thematically *restricted*) and [±] (*objective*) cross-classify grammatical functions: subject is [−*,* −], object is [−*,* +], obliques are [+*,* −] and restricted objects are [+*,* +]. A monotonic derivation (where feature values cannot be changed) starts from the argument list with the Intrinsic Classification (I.C. in example (23) below), then morpholexical operations such as passivization can suppress a role (not shown), then the thematically highest role (such as the Agent), if [−], is selected as Subject, and then any remaining features receive positive values by default.


In the macro-role theory formulated for HPSG, the analogues of [−] and [−] are the macro-roles Actor (ACT) and Undergoer (UND), respectively. The names of these features reflect large general groupings of semantic role types, but there is not a unique semantic entailment such as "agency" or "affectedness" associated with each of them. Actor and Undergoer name whatever semantic roles map to the subject and object, respectively, of a transitive verb.<sup>15</sup> On the semantic side they are disjunctively defined: *x* is the Actor and *y* is the Undergoer iff "*x* causes a change in *y*, or *x* has a notion of *y*, or …" (quoted from Davis, Koenig & Wechsler 2021: 328, Chapter 9 of this volume). Such disjunctive definitions are the

<sup>15</sup>Note, for example, that within this system the "Undergoer" argument of the English verb *undergo*, as in *John underwent an operation*, is the object, and not the subject as one might expect if being an Undergoer involved actually undergoing something.

#### 30 HPSG and Lexical Functional Grammar

HPSG analogues of the LMT "semantic classifications" shown in (22) above. In the HPSG macro-role system, linking constraints dictate that the ACT argument maps to the first element of ARG-ST, and that the UND argument maps to some nominal element of ARG-ST; (24) and (25) are from Davis, Koenig & Wechsler (2021: 329), Chapter 9 of this volume (each set of dots in the list value of ARG-ST represents zero or more list items):

$$\begin{array}{ll} \text{(24)} & \begin{bmatrix} \text{CONTENT} [\text{KEY} \quad \text{[ACT [\text{]}]} \\\\ \text{ARG-ST} & \begin{Bmatrix} \text{NP} [\text{]}, \dots \end{Bmatrix} \end{bmatrix} \\\\ \text{(25)} & \begin{bmatrix} \text{CONTENT} [\text{KEV} \left[ \text{UND} \left[ \text{]} \right] \\\\ \text{ARG-ST} & \begin{Bmatrix} \text{...}, \text{NP} [\text{[ $\mathbb{Z}$ ]}, \dots \end{Bmatrix} \end{bmatrix} \end{array} \end{array}$$

The first element of ARG-ST maps to the subject of an active voice verb, so (24)– (25) imply that if there is an ACT, then that ACT is the subject, and otherwise the UND is the subject (in the latter case NP <sup>2</sup> is the initial item in the list, given that the ellipsis represents zero or more items). Similarly, in LMT as described above, the subject is the [−] highest argument, if there is one, and otherwise it is the [−] argument. In this simple example we can see how the two systems accomplish exactly the same thing. A careful examination of more complex examples might point up theoretical differences, but it seems more likely that the two systems can express virtually the same set of mappings.

In LFG the *argument structure* (or *a-structure*) contains the predicator and its argument roles classified and ordered by thematic role type and further classified by Intrinsic Classification. It is considered a distinct level of representation, along with c-structure and f-structure. As a consequence, the grammar can make reference to the initial item in a-structure, such as the agent in (23), which is considered the "most prominent" role and often called the *a-subject* ("argument structure subject") in LFG parlance. To derive the passive voice mapping, the a-subject is suppressed in a morpholexical operation that crucially takes place before the subject is selected:

(26) Derivation of *eaten* as in *A yam was eaten* (*by Pam*).:

#### Stephen Wechsler & Ash Asudeh

(The optional *by*-phrase is considered to be an adjunct realizing the passivized a-subject.) Note that the passive alternation is *not* captured by a procedural rule that replaces one grammatical relation (such as OBJ) with another (such as SUBJ). The mapping from word strings to f-structures in LFG is monotonic, in the sense that information cannot be destroyed or changed. As a result, the mapping between internal and external structures is said to be transparent in the sense that the grammatical relations of parts of the sentence are preserved in the whole (for discussion of this point, see Bresnan et al. 2016: Chapter 5). In early versions of LFG, monotonicity was assumed for the syntax proper, while destructive procedures were permitted in the lexicon. This was canonized in the *Principle of Direct Syntactic Encoding*, according to which all grammatical relation changes are lexical (Kaplan & Bresnan 1982: 180; Bresnan et al. 2016: 77). At that time, an LFG passive rule operated on fully specified predicate argument structures, replacing OBJ with SUBJ, and SUBJ with an OBL*by* or an existentially bound variable. The advent of LMT brought monotonicity to the lexicon as well. The HPSG lexicon is also monotonic, if lexical rules are formulated as unary branching rules (see Davis & Koenig 2021: Section 5.1, Chapter 4 of this volume).

# **9 Long distance dependencies**

In LFG a long distance dependency is modeled as a reentrancy in the f-structure. The HPSG theory of long distance dependencies is based on that of GPSG (Gazdar 1981) and uses the percolation of a SLASH feature through the constituent structure. But LFG and HPSG accounts are essentially very similar, both working by decomposing a long distance dependency into a series of local dependencies. As we will see, there are nevertheless some minor differences with respect to what hypothetical extraction patterns can be expressed.

Both frameworks allow accounts either with or without gaps: regarding LFG see Bresnan et al. (2016: Chapter 9) for gaps and Dalrymple et al. (2019: Chapter 17) for gapless; regarding HPSG see Pollard & Sag (1994: Chapter 4) for gaps and Sag et al. (2003: Chapter 14) for gapless. Gaps have been motivated by the (controversial) claim that the linear position of an empty category matters for the purpose of weak crossover and other binding phenomena (Bresnan et al. 2016: 210–223). In this section we compare gapless accounts.

LFG has two grammaticalized discourse functions, TOP (topic) and FOC (focus). A sentence with a left-adjoined topic position is depicted in Figure 6. The topic phrase *Ann* serves as the object of the verb *like* within the clausal complement of *think*. This dependency is encoded in the second equation annotating the topic node, where the variable ranges over strings of attributes representing

#### 30 HPSG and Lexical Functional Grammar

grammatical functions such as SUBJ, OBJ, OBL, or COMP. These strings describe paths through the f-structure. In this example is resolved to the string COMP OBJ, so this equation has the effect of adding to the f-structure in (27) the curved line representing token identity.

Figure 6: Long distance dependencies in LFG

HPSG accounts are broadly similar. One HPSG version relaxes the requirement that the arguments specified in the lexical entry of a verb or other predicator must all appear in its valence lists. Arguments are represented by elements of the ARG-ST list, so the list for the verb *like* contains two NPs, one each for the subject and object. In a sentence with no extraction, those ARG-ST list items map to the valence lists, the first item appearing in SUBJ and any remaining ones in

#### Stephen Wechsler & Ash Asudeh

COMPS. To allow for extraction, one of those ARG-ST list items is permitted to appear on the SLASH list instead. The SLASH list item is then passed up the tree by means of strictly local constraints, until it is bound by the topicalized phrase (see Borsley & Crysmann 2021, Chapter 13 of this volume and Bouma et al. 2001).

The LFG dependency is expressed in the f-structure, not the c-structure. Bresnan et al. (2016: Chapter 2) note that this allows for category mismatches between the phrases serving as filler and those in the canonical, unextracted position. This was discussed in Section 4 and illustrated with example (8) above.

Constraints on extraction such as accessibility conditions and islandisland constraint constraints can be captured in LFG by placing constraints on the attribute string (Dalrymple et al. 2019: Chapter 17).<sup>16</sup> If subjects are not accessible to extraction, then we stipulate that SUBJ cannot be the final attribute in ; if subjects are islands, then we stipulate that SUBJ cannot be a non-final attribute in . If the f-structure is the only place such constraints are stated, then this makes the interesting (but unfortunately false; see presently) prediction that the theory of extraction cannot distinguish between constituents that map to the same f-structure. For example, as noted in Section 5, function words like determiners and their contentful sisters like NP are usually assumed to be f-structure co-heads, so the DP *the lion* maps to the same f-structure as its daughter *lion* (see Figure 3). This predicts that if the DP can be extracted, then so can the NP, but of course that is not true:

	- b. \* Lion, I think she saw the.

These two sentences have exactly the same f-structures, so any explanation for the contrast in acceptability must involve some other level. For example, one could posit that the phrase structure rules can introduce some items obligatorily (see Snijders 2015: 239), such as an obligatory sister of the determiner *the*. 17

# **10 Nonconfigurationality**

Some languages make heavy use of case and agreement morphology to indicate the grammatical relations, while allowing very free ordering of words within the clause. Such radically nonconfigurational syntax receives a straightforward anal-

<sup>16</sup>Asudeh (2012) shows that LFG's *off-path constraints* (Dalrymple et al. 2019: 225–230) can even capture quite strict locality conditions on extraction, in the spirit of *successive cyclicity* in movement-based accounts (Chomsky 1973; 1977), but without movement/transformations.

<sup>17</sup>This would depart from the assumption that all nodes are optional, adopted in Bresnan et al. (2016).

#### 30 HPSG and Lexical Functional Grammar

ysis in LFG, due to the autonomy of functional structure from constituent structure. Indeed, the notion that phrasal position and word-internal morphology can be functionally equivalent is a foundational motivation for that separation of structure and function, as noted in Section 2 above. As Bresnan et al. (2016: 5) observe, "The idea that words and phrases are alternative means of expressing the same grammatical relations underlies the design of LFG, and distinguishes it from other formal syntactic frameworks."

The LFG treatment of nonconfigurationality will be illustrated with a simplified analysis, from Bresnan et al. (2016: 352–353), of the clausal syntax of Warlpiri, a Pama-Nyungan language of northern Australia. The following example gives three of the many possible grammatical permutations of words, all expressing the same truth-conditional content.

	- b. Kurdu-jarra-rlu child-DUAL-ERG ka-pala PRES-DUAL maliki dog.ABS wajilipi-nyi chase-NONPAST wita-jarra-rlu. small-DUAL-ERG 'The two small children are chasing the dog.'
	- c. Maliki dog.ABS ka-pala PRES-DUAL kurdu-jarra-rlu child-DUAL-ERG wajilipi-nyi chase-NONPAST wita-jarra-rlu. small-DUAL-ERG 'The two small children are chasing the dog.'

The main constraint on word order is that the auxiliary (here, the word *kapala*) must immediately follow the first daughter of S, where that first daughter can be any other word in the sentence, or else a multi-word NP as in (29a). Apart from that constraint, all word orders are possible. Any word or phrase in the clause can precede the auxiliary, and the words following the auxiliary can appear in any order.

The LFG analysis of these sentences works by directly specifying the auxiliarysecond constraint within the c-structure rule in (30a).<sup>18</sup> Then the lexical entries directly specify the case, number, and other grammatical features of the word

<sup>18</sup>The c-structure is slightly simplified for illustrative purposes. In the actual c-structure proposed for Warlpiri, the second position auxiliary is the c-structure head (of IP) taking a flat S as its right sister (see Austin & Bresnan 1996: 225). Because the IP functional projection and its complement S map to the same f-structure (as discussed in Section 5 above), the analysis presented here works in exactly the same way regardless of whether this extra structure appears.

Stephen Wechsler & Ash Asudeh

forms, including case-assignment properties of the verb (see (32)). The framework does the rest, licensing all and only grammatical word orderings and generating an appropriate f-structure (see (33)).<sup>19</sup>


<sup>19</sup>The value of LFG's ADJ feature is a set of f-structures, as there can be multiple adjuncts, in fact indefinitely many. We use the set membership symbol as an attribute (Dalrymple et al. 2019: 229–230), which results in the f-structure for 'small' being in a set.

#### 30 HPSG and Lexical Functional Grammar

The functional annotations on the NP nodes (see (31)) can vary as long as they secure Completeness and Coherence, given the governing predicate. In this case the main verb is transitive, so SUBJ and OBJ must be realized, each with exactly one PRED value.<sup>20</sup> The noun *wita* 'small' is of category N, as Warlpiri does not distinguish between nouns and adjectives; but it differs functionally from the nouns for 'child' and 'dog' in that it modifies another noun. This is indicated by embedding its PRED feature under the ADJ ("adjunct") attribute (see the first equation in (32c)).

Comparing HPSG accounts of nonconfigurationality is instructive. Two HPSG approaches are described in Müller (2021), Chapter 10 of this volume: the *order domain* approach (Donohue & Sag 1999) and the *non-cancellation* approach (Bender 2008). In the order domain approach, the words of Warlpiri are combined into constituent structures resembling those of a configurational language like English. For example, in an order domain analysis of the Warlpiri sentences in (29), as well as all other acceptable permutations of those word forms, the words for 'two small children' together form an NP constituent. However, a domain feature DOM lists the phonological forms of the words in each constituent, and allows that list order to vary freely relative to the order of the daughters. This effective shuffling of the DOM list applies recursively on the nodes of the tree, up to the clausal node. It is the DOM list order that determines the order of words for pronunciation of the sentence. That function of the DOM feature is carried out in LFG by the c-structure.

The *non-cancellation* approach effectively introduces into HPSG correlates of c-structure and f-structure. In essence, an f-structure is added to HPSG in the form of a feature, much like the FS feature in the pseudo-HPSG rule in (7b) above. Instead of FS, that feature is called COMPS and has a list value. Unlike the valence feature normally called COMPS, items of this COMPS feature are not canceled from the list (and unlike ARG-ST, this feature is shared between a phrase and its head daughter, so it appears on non-terminal nodes). The items of that list are referenced by their order in the list. Special phrasal types define grammatical relations between a non-head daughter and an item in the COMPS list of the head daughter. These phrasal types are equivalent to LFG annotated phrase structure rules. For example, suppose the second item in COMPS (*2nd-comp*) is the object. Then *head-2nd-comp-phrase* in Bender (2008: 12) is equivalent to an LFG rule where the non-head daughter has the OBJ annotation (see (31a)). Since the list item is not canceled from the list, it remains available for other items to com-

<sup>20</sup>By the Principle of Completeness, both SUBJ and OBJ appear in the f-structure; by the Principle of Coherence, no other governable grammatical function appears in the f-structure.

Stephen Wechsler & Ash Asudeh

bine and to modify the object, using a different phrasal type. Non-cancellation mechanisms bring HPSG closer to LFG by relying on a level of structure that is autonomous from the constituent structure responsible for the grouping and ordering of words for pronunciation. See also Müller (2008) for a non-cancellation approach to depictive predicates in English and German.

# **11 Raising and control**

Raising and control (equi) words can be treated in virtually the same way in LFG and HPSG. Taking raising first, a subject raising word (such as *seem* in (34)) specifies that its subject is (also) the subject of its predicate complement (see Abeillé (2021), Chapter 12 of this volume).

(34) Pam seems to visit Fred.

a. *seem*: (↑ PRED) = 'seemh(↑ XCOMP)i (↑ SUBJ)' (↑ SUBJ) = (↑ XCOMP SUBJ) b. *seem*: [ ARG-ST h 1 NP, VP[*inf*, SUBJ h 1 i]i]

The LFG entry for *seem* in (34a) contains the grammatical function XCOMP ("open complement"), the function reserved for predicate complements such as the infinitival phrase *to visit Fred*. The functional control equation specifies that its subject is identical to the subject of the verb *seem*; the tag 1 plays the same role in the simplified HPSG entry in (34b).

The f-structure for (34) is shown here (with simplified structures for *Pam* and *Fred*):

Turning next to equi, similar proposals have been made in both frameworks, such that it is the referential index of the controller and the controllee that are identical:

(36) Pam hopes to visit Fred.

a. *hope*: (↑ PRED) = 'hopeh(↑ SUBJ) (↑ COMP)i' (↑ COMP SUBJ PRED) = 'pro' (↑ SUBJ INDEX) = (↑ COMP SUBJ INDEX) b. *hope*: [ ARG-ST h NP <sup>1</sup> , VP[*inf*, SUBJ h NP <sup>1</sup> i]i]

#### 30 HPSG and Lexical Functional Grammar

The LFG entry for *hope* in (36a) is adapted from Dalrymple et al. (2019: 572). It states that the subject of the controlled infinitival is a pronominal that is coindexed with the subject of the control verb. Similarly, the subscripted tags in (36b) represent coindexing between the subject of the control verb and the controlled subject of the complement.

One interesting difference between the two frameworks concerns the representation of restrictions on the grammatical function of the target of control or raising. The basic HPSG theory of control and raising (for example, the one presented in Pollard & Sag 1994: 132–145) allows only for control (or raising) of subjects and not complements. More precisely, it allows for control/raising of the outermost or final dependent to be combined with the verbal projection that is a complement of the control verb. This is because of the list cancellation regime that operates with valence lists (on non-cancellation theories, see Section 10). The expression VP in (34b/36b) represents an item with an empty COMPS list. In a simple English clause, the verb combines with its complement phrases to form a VP constituent, with which the subject is then combined to form a clause. Assuming the same order of combination in the control or raising structure, it is not possible to raise or control the complement of a structure that contains a structural subject, as in (37a):

	- b. Fred seems to be visited by Pam.

The intended meaning would be that of (37b). The passive voice is needed in order to make *Fred* the subject of *visit* and thus available to be raised. This restriction that only subjects can be raised follows from the basic HPSG theory of Pollard & Sag (1994: 132-145), while in LFG it follows only if raising equations like the one in (38) are systematically excluded.

$$(\text{38}) \quad (\uparrow \text{ звед}) = (\uparrow \text{ хсомр овд})$$

At the same time, the HPSG framework allows for mechanisms that can be used to violate the restriction to subjects, and such mechanisms have been proposed, including the adoption of something similar to an f-structure in HPSG (this is the non-cancellation theory described in Section 10). This illustrates the point made in Section 2 above, that the framework was originally designed to capture locality conditions, but is flexible enough to capture non-local relations as well.

This raises the question of whether the restriction to subject controllees is universal. In fact, it appears that some languages allow the control of non-subjects, but it is still unclear whether these control relations are established via the grammatical relations and therefore justify equations such as (38). For instance, Kroeger (1993) shows that Tagalog has two types of control relation. In the more

#### Stephen Wechsler & Ash Asudeh

specialized type, which occurs only with a small set of verbs or in a special construction in which the downstairs verb appears in non-volitive mood, both the controller and controllee must be subjects. Kroeger analyzes this type using a functional control equation like the one in (34a). In the more common type of Tagalog control, the controllee must be the Actor argument, while the grammatical relations of controllee and controller are not restricted. (Tagalog has a rich voice system, often called a focus marking system, regulating which argument of a verb is selected as its subject.) This latter type of Tagalog control is defined on argument structure (Actors, etc.), so a-structure rather than f-structure is appropriate for representing the control relations.

# **12 Semantics**

HPSG was conceived from the start as a theory of the *sign* (de Saussure 1916), wherein each constituent is a pairing of form and meaning. So semantic representation and composition were built into HPSG (and the related framework of Sign-Based Construction Grammar; Boas & Sag 2012), as reflected in the title of the first HPSG book (Pollard & Sag 1987), *Information-Based Syntax and Semantics*. LFG was not founded as a theory that included semantics, but a semantic component was developed for LFG shortly after its foundation (Halvorsen 1983). The direction of semantics for LFG changed some ten years later and the dominant tradition is now Glue Semantics (Dalrymple et al. 1993; Dalrymple 1999; 2001; Asudeh 2012; Dalrymple et al. 2019).

This section presents a basic introduction to Glue Semantics (Glue); this is necessary to fully understand a not insignificant portion of LFG literature of the past fifteen years, which interleaves LFG syntactic analysis with Glue semantic analysis. The section is not meant as a direct comparison of LFG and HPSG semantics, for two reasons. First, as explained in the previous paragraph, HPSG is inherently a theory that integrates syntax and semantics, but LFG is not; the semantic module that Glue provides for LFG can easily be pulled out, leaving the syntactic component exactly the same.<sup>21</sup> Second, as will become clear in the next section, at a suitable level of abstraction, Glue offers an underspecified theory of semantic composition, in particular scope relations, which is also the goal of an influential HPSG semantic approach, Minimal Recursion Semantics (MRS; Copestake et al. 2005). But beyond observing this big-picture commonality, comparison of Glue and MRS would require a chapter in its own right. Our goal is to present enough of Glue Semantics for readers to grasp the main intuitions behind it, without presupposing much knowledge of formal semantic theory. The

<sup>21</sup>On the relation between the PRED feature and the semantic component, see footnote 24 below.

#### 30 HPSG and Lexical Functional Grammar

references listed at the end of the previous paragraph (especially Dalrymple et al. 2019) are good places to find additional discussion and references.

The rest of this section is organized as follows. In Section 12.1 we present some more historical background on semantics for LFG and HPSG. In Section 12.2, we present Glue Semantics, as a general compositional system in its own right. Then, in Section 12.3, we look at the syntax–semantics interface with specific reference to an LFG syntax. For further details on semantic composition and the syntax– semantics interface in constraint-based theories of syntax, see Koenig & Richter (2021), Chapter 22 of this volume for semantics for HPSG and Asudeh (2021) for Glue Semantics for LFG.

# **12.1 Brief history of semantics for LFG and HPSG**

Various theories of semantic representation have been adopted by the different non-transformational syntactic frameworks over the years. The precursor to HPSG, GPSG (Gazdar et al. 1985), was paired by its designers with a then fairly standard static Montogovian semantics (Montague 1973), but GPSG itself was subsequently adopted as the syntactic framework used by Kamp & Reyle (1993: 9) for Discourse Representation Theory, a dynamic theory of semantics. Initial work on semantics for LFG also assumed a Montogovian semantics (Halvorsen 1983; Halvorsen & Kaplan 1988). But with the increasing interest in Situation Semantics (Barwise & Perry 1983) in the 1980s at Stanford University and environs (particularly SRI International and Xerox PARC), the sites of the foundational work on both HPSG and LFG, both frameworks incorporated a Situation Semantics component (on LFG see Fenstad et al. 1987). Interest in the use of Situation Semantics did not last as long in LFG as it did in HPSG, where Situation Semantics was carried over into the second main HPSG book (Pollard & Sag 1994) and beyond (Ginzburg & Sag 2000).

Beginning in the nineties, the focus subsequently shifted in new directions due to a new interest in computationally tractable theories of the syntax–semantics interface, to support efforts at large-scale grammar development, such as the Par-Gram project for LFG (Butt et al. 1999; 2002) and the LinGO/Grammar Matrix and CoreGram projects for HPSG (Flickinger 2000; Bender et al. 2002; 2010; Müller 2015).<sup>22</sup> This naturally led to an interest in underspecified semantic representa-


<sup>22</sup>Readers can explore the current incarnations of these projects at the following links (checked 2021-04-30):

#### Stephen Wechsler & Ash Asudeh

tions, so that semantic ambiguities such as scope ambiguity could be compactly encoded without the need for full enumeration of all scope possibilities. Two examples for HPSG are *Lexical Resource Semantics* (Richter 2004; Penn & Richter 2004) and *Minimal Recursion Semantics* (Copestake et al. 2005). Similarly, focus in semantics for LFG shifted to ways of encoding semantic ambiguity compactly and efficiently. This led to the development of Glue Semantics.

# **12.2 General Glue Semantics**

In this section, we briefly review Glue Semantics itself, without reference to a particular syntactic framework. Glue Semantics is a general framework for semantic composition that requires *some* independent syntactic framework but does not presuppose anything about syntax except headedness, which is an uncontroversial assumption across frameworks. This makes the system flexible and adaptable, and it has been paired not just with LFG, but also with Lexicalized Tree-Adjoining Grammar (Frank & van Genabith 2001), HPSG (Asudeh & Crouch 2002b), Minimalism (Gotham 2018), and Universal Dependencies (Gotham & Haug 2018).

In Glue Semantics, meaningful linguistic expressions—including lexical items but possibly also particular syntactic configurations—are associated with *meaning constructors* of the following form:<sup>23</sup>

# (39) M :

M is an expression from a *meaning language* which can be anything that supports the lambda calculus; is an expression of *linear logic* (Girard 1987), which specifies the semantic composition (it "glues meanings together"), based on a syntactic parse. By convention a colon separates them. Glue Semantics is related to (Type-Logical) Categorial Grammar (Carpenter 1998; Morrill 1994; 2011; Moortgat 1997), but the terms of the linear logic specify just semantic composition without regard to word order (see Asudeh 2012 for further discussion). Glue Semantics is therefore useful in helping us focus on semantic composition in its own right.

The principal compositional rules for Glue Semantics are those for the linear implication connective, ⊸, which are here presented in a natural deduction format, in which each connective has an elimination rule (⊸E, in this case) and an introduction rule (⊸I, in this case).

<sup>23</sup>It is in principle possible for a linguistic expression to have a phonology and syntax but not contribute to interpretation, such as the expletives *there* and *it* or the *do*-support auxiliary in English; see Asudeh (2012: 113) for some discussion of expletive pronouns in the context of Glue.

30 HPSG and Lexical Functional Grammar

(40) Functional application : Implication elimination (modus ponens)

$$\frac{f:A \multimap B \quad \quad a:A}{f(a):B} \multimap\_{\mathcal{E}}^{\mathcal{E}}$$

(41) Functional abstraction : Implication introduction (hypothetical reasoning)

$$\frac{[a:A]^1}{\vdots}$$

$$\frac{f:B}{\lambda a.f:A \multimap B} \multimap\_{I,1}$$

Focusing first on the right-hand, linear logic side, the implication elimination rule is just standard modus ponens. The implication introduction rule is hypothetical reasoning. A hypothesis is made in the first line as an assumption, indicated by presenting it in square brackets with an index that flags the particular hypothesis/assumption. Given this hypothesis, if through some series of proof steps, indicated by the vertical ellipsis, we derive a term, then we are entitled to discharge the assumption, using its flag to indicate that it is this particular assumption that has been discharged, and conclude that the hypothesis implies the term so-derived. In each of these rules, the inference over the linear logic term corresponds to an operation on the meaning term, via the Curry-Howard Isomorphism between formulas and types (Curry & Feys 1958; Howard 1980). The rule for eliminating the linear implication corresponds to functional application. The rule for introducing the linear implication corresponds to functional abstraction. These rules will be seen in action shortly.

In general, given some head *ℎ* and some arguments of the head *<sup>1</sup> , . . . ,* , an implicational term like the following models consumption of the arguments to yield the saturated meaning of the head: *<sup>1</sup>* ⊸ *. . .* ⊸ ⊸*ℎ*. For example, let us assume the following meaning constructor for the verb *likes* in the sentence *Max likes Sam*:

# (42) *..***like**() () : ⊸ ⊸

Let's also assume that is mnemonic for the semantic correspondent of the (single word) phrase *Sam*, similarly mnemonic for *Max*, and for *likes*. In other words, the meaning constructor for *likes* would be associated with the lexical entry for the verb and specified in some general form such that it can be instantiated by the syntax (we will see an LFG example shortly); here we are assuming that the instantiation has given us the meaning constructor in (42).

#### Stephen Wechsler & Ash Asudeh

Given this separate level of syntax, the glue logic does not have to worry about word order and is permitted to be commutative (unlike the logic of Categorial Grammar, see also Kubota 2021, Chapter 29 of this volume on Categorial Grammar and Müller 2021: 379, Chapter 10 of this volume on HPSG approaches allowing saturation of elements from the valence lists in arbitrary order). We could therefore freely reorder the arguments for *likes* in (42) above, as in (43) below, such that we instead first compose with the subject and then the object, but still yield the meaning appropriate for the intended sentence *Max likes Sam* (rather than for *Sam likes Max*):

# (43) *..***like**() () : ⊸ ⊸

As we will see below, the commutativity of the glue logic yields a simple and elegant treatment of quantifiers in non-subject positions, which are challenging for other frameworks (see, for example, the careful pedagogical presentation of the issue in Jacobson 2014: 244–263).

First, though, let us see how this argument reordering, otherwise known as Currying or Schönfinkelization, works in a proof, which also demonstrates the rules of implication elimination and introduction:

$$\begin{array}{c} \text{(44)} \quad \frac{\lambda y.\lambda x.f(y)(x):a\multimap b\multimap c}{\lambda x.f(v)(x):b\multimap c}^{1}\multimap\_{\mathcal{E}} & \multimap\_{\mathcal{E}} & [u:b]^{2} \\ \hline & & & & [u:b]^{2} & \multimap c \\ & & & & \overline{\lambda v.f(v)(u):a\multimap c}^{\sim T\_{1},1} \\ & & & \overline{\lambda y.f(y)(u):a\multimap c}^{\sim T\_{1},1} & \multimap\_{a} \\ & & & \overline{\lambda u.\lambda y.f(y)(u):b\multimap a\multimap c}^{\sim T\_{1},2} & \multimap\_{a} \\ \hline & & & \overline{\lambda x.\lambda y.f(y)(x):b\multimap a\multimap c}^{\sim T\_{1},2} \\ \end{array}$$

The general structure of the proof is as follows. First, an assumption (hypothesis) is formed for each argument, in the order in which they originally occur, corresponding to a variable in the meaning language. Each assumed argument is then allowed to combine with the implicational term by implication elimination. Once the implicational term has been entirely reduced, the assumptions are then discharged in the same order that they were made, through iterations of implication introduction. The result is the original term in curried form, such that the order of arguments has been reversed but without any change in meaning. The two steps of -equivalence, notated ⇒ , are of course not strictly necessary, but have been added for exposition.

This presentation has been purposefully abstract to highlight what is intrinsic to the glue logic, but we need to see how this works with a syntactic framework to

#### 30 HPSG and Lexical Functional Grammar

see how Glue Semantics actually handles semantic composition and the syntax– semantics interface. So next, in Section 12.3, we will review LFG+Glue.

# **12.3 Glue Semantics for LFG**

Glue for LFG will be demonstrated by analyses of the following three examples:

(45) a. Blake called Alex.


Example (45a) is a simple case of a transitive verb with two proper name arguments, but is sufficient to demonstrate the basics of the syntax–semantics interface in LFG+Glue. Example (45b) is a case of a quantifier in object position, which is challenging to compositionality because there is a type clash between the simplest type we can assign to the verb, h*,* h*,* ii, and the simplest type that would be assigned to the quantifier, hh*,* i*,* i. In other theories, this necessitates either a syntactic operation which is undermotivated from a purely syntactic perspective, e.g. Quantifier Raising (QR) in interpretive theories of composition, such as Logical Form semantics (May 1977; 1985; Heim & Kratzer 1998), or a type shifting operation of some kind in directly compositional approaches, as in categorial or type-logical frameworks; see Jacobson (2014: Chapter 14) for further discussion and references. Example (45c) also demonstrates this point, but it more importantly demonstrates that quantifier scope ambiguity can be handled in Glue without positing an undermotivated syntactic ambiguity, but nevertheless while maintaining the simplest types for both quantifiers.

The relevant aspects of the lexical entries involved are shown in Table 30.1. Other syntactic aspects of the lexical items, such as the fact that *called* has a SUBJ and an OBJ, are specified in its meaning constructor. Minimal f-structures are provided below for each example. The subscript indicates the semantic structure that corresponds to the annotated f-structure term. The types for the lexical items are the minimal types that would be expected. Note that in Glue these are normally associated directly with the semantic structures, for example ↑ and (↑ OBJ) ⊸ (↑ SUBJ) ⊸ ↑ , but they have been presented separately for better exposition; see Dalrymple et al. (2019: 299–305) for further discussion. We do not show semantic structures here, as they are not necessary for this simple demonstration.

The functions associated with *everybody* and *somebody* are, respectively, **every** and **some** in the meaning language, where these are the standard quantificational determiners from generalized quantifier theory (Montague 1973; Barwise

#### Stephen Wechsler & Ash Asudeh

& Cooper 1981; Keenan & Faltz 1985). The function **every** returns true iff the set characterized by its restriction is a subset of the set characterized by its scope. The function **some** returns true iff the intersection of the set characterized by its restriction and the set characterized by its scope is non-empty. The universal quantification symbol ∀ in the glue logic/linear logic terms for the quantifiers ranges over semantic structures of type . It is unrelated to the meaning language functions **every** and **some**. Hence even the existential word *somebody* has the universal ∀ in its linear logic glue term. The ∀-terms thus effectively say that *any* type semantic structure that can be found by application of proof rules such that the quantifier's semantic structure implies can serve as the scope of the quantifier; see Asudeh (2005: 393–394) for basic discussion of the interpretation of ∀ in linear logic. This will become clearer when quantifier scope is demonstrated shortly.

Table 30.1: Relevant lexical details for the three examples in (45)


Let us assume the following f-structure for (45a):

$$\begin{array}{ll} \text{(46)} & c \begin{bmatrix} \text{pRED \\_call'}\\ \text{subJ} & b \begin{bmatrix} \text{pRED \\_Blueke'}\\ \text{OBJ} & a \begin{bmatrix} \text{pRED \\_Alex'} \end{bmatrix} \end{bmatrix} \end{array} \end{array}$$

Note that here, unlike in previous sections, the PRED value for the verb does not list its subcategorization information. This is because we've made the move that is standard in much Glue work to suppress this information.<sup>24</sup> The f-structures are named mnemonically by the first character of their PRED value. All other f-structural information has been suppressed for simplicity. Based on these f-

<sup>24</sup>Indeed, one could go further and argue that PRED values do not list subcategorization at all, in which case the move is not just notational, and that the Principles of Completeness and Coherence instead follow from the resource-sensitivity of Glue Semantics; for some discussion, see Asudeh (2012: 112–114) and Dalrymple et al. (2019: 299–301).

#### 30 HPSG and Lexical Functional Grammar

structure labels, the relevant meaning constructors in the lexicon in Table 30.1 are instantiated as follows ( subscripts suppressed):

(47) Instantiated meaning constructors: **blake** : **alex** : *..***call**() () : ⊸ ⊸

These meaning constructors yield the following proof, which is the only available normal form proof for the sentence, where ⇒ indicates -equivalence:<sup>25</sup>

(48) Proof:

$$\begin{array}{c} \lambda y.\lambda \text{x.call}(y)(\mathbf{x}): a \multicolumn{3}{c}{\multicolumn{1}{c}{\mathbf{a}}}(y)(\mathbf{x}): a \multicolumn{3}{c}{\multicolumn{1}{c}{\mathbf{a}}}(y)(\mathbf{x}): a \multicolumn{3}{c}{\mathbf{a}}}{\text{ \multicolumn{1}{c}{\lambda x.\text{call}(\mathbf{a}\mathbf{e}\mathbf{x})(\mathbf{x}) \,(\mathbf{a}\mathbf{e}\mathbf{x}): b \multicolumn{3}{c}{\mathbf{a}}} \multicolumn{3}{c}{\mathbf{a}}} \multicolumn{3}{c}{\multicolumn{1}{c}{\mathbf{a}}} \multicolumn{3}{c}{\mathbf{a}}}{\begin{array}{c} \lambda \mathbf{x}.\text{call}(\mathbf{a}\mathbf{e}\mathbf{x})(\mathbf{x}): b \multicolumn{3}{c}{\mathbf{a}} \multicolumn{3}{c}{\mathbf{a}} \multicolumn{3}{c}{\mathbf{b}} \text{l} \&\mathbf{a}\mathbf{e}: b} \multicolumn{3}{c}{\mathbf{a}} \multicolumn{3}{c}{\mathbf{a}} \multicolumn{3}{c}{\mathbf{b}} \multicolumn{3}{c}{\mathbf{a}} \multicolumn{3}{c}{\mathbf{b}} \multicolumn{3}{c}{\mathbf{b}} \multicolumn{3}{c}{\mathbf{a}} \multicolumn{3}{c}{\mathbf{b}} \multicolumn{3}{c}{\mathbf{b}}} \multicolumn{3}{c}{c}{\mathbf{a}} \multicolumn{3}{c}{\mathbf{b}} \multicolumn{3}{c}{\mathbf{b}} \multicolumn{3}{c}{\mathbf{c}} \end{array}$$

The final meaning language expression, **call**(**alex**) (**blake**), gives the correct truth conditions for *Blake called Alex*, based on a standard model theory.

Let us next assume the following f-structure for (45b):

$$\text{(49)} \quad c \begin{bmatrix} \text{pRED \text{\textquotedblleft}call\text{\textquotedblright}}\\ \text{subj} \text{ b} \begin{bmatrix} \text{pRED \text{\textquotedblleft}Blue\text{\textquotedblright}}\\ e \text{\textquotedblleft} \text{pRED \textquotedblright} \end{bmatrix} \end{bmatrix}$$

Based on these f-structure labels, the meaning constructors in the lexicon are instantiated as follows ( subscripts again suppressed):

(50) Instantiated meaning constructors: *..***call**() () : ⊸ ⊸ *.***every**(**person***,* ) : ∀*.*( ⊸ ) ⊸ **blake** :

These meaning constructors yield the following proof, which is again the only available normal form proof:<sup>26</sup>

<sup>25</sup>The reader can think of the normal form proof as the minimal proof that yields the conclusion, without unnecessary steps of introducing and discharging assumptions; see Asudeh & Crouch (2002a) for some basic discussion.

<sup>26</sup>We have not presented the proof rule for Universal Elimination, ∀E, but it is trivial; see Asudeh (2012: 396).

#### Stephen Wechsler & Ash Asudeh

(51) Proof:


The final meaning language expression, **every**(**person***, .***call**() (**blake**)), again gives the correct truth conditions for *Blake called everybody*, based on a standard model theory with generalized quantifiers.

Notice that the quantifier does not move in the syntax, contra QR analyses; see Gotham (2018) for contrastive discussion. The quantifier is just an OBJ in fstructure, and no special type shifting was necessary. This is because the proof rules allow us to temporarily fill the position of the object quantifier with a hypothetical meaning constructor that consists of a type variable paired with the linear logic term for the object; this assumption is then discharged to return the scope of the quantifier, ⊸ , and the corresponding variable is bound, to yield the function that maps individuals called by Blake to a truth value. In other words, we have demonstrated that this approach scopes the quantifier without positing an *ad hoc* syntactic operation and without complicating the type of the object quantifier or the transitive verb. This is ultimately due to the commutativity of the glue logic, linear logic, since the proof does not have to deal with the elements of composition (words) in their syntactic order, because the syntax is separately represented by c-structure (not shown here) and f-structure.

Lastly, let us assume the following f-structure for (45c), *Everybody called somebody*:

(52) PRED 'call' SUBJ - PRED 'everybody' OBJ - PRED 'somebody' 

Based on these f-structure labels, the meaning constructors in the lexicon are instantiated as follows:

(53) Instantiated meaning constructors: *..***call**() () : ⊸ ⊸ *.***some**(**person***,* ) : ∀*.*( ⊸ ) ⊸ *.***every**(**person***,* ) : ∀*.*( ⊸ ) ⊸

#### 30 HPSG and Lexical Functional Grammar

These meaning constructors yield the following proofs, which are the only available normal form proofs, but there are two distinct proofs, because of the scope ambiguity:<sup>27</sup>

(54) Proof 1 (subject wide scope):


(55) Proof 2 (object wide scope):

*.***some**(**person***,* ) : ∀*.*( ⊸ ) ⊸ *.***every**(**person***,* ) : ∀*.*( ⊸ ) ⊸ *..***call**() () : ⊸ ⊸ [ : ] 1 <sup>⊸</sup>E*,* <sup>⇒</sup> *.***call**() () : ⊸ <sup>⊸</sup>E*,* <sup>∀</sup><sup>E</sup> [/]*,* <sup>⇒</sup> **every**(**person***, .***call**() ()) : ⊸I*,*<sup>1</sup> *.***every**(**person***, .***call**() ()) : ⊸ <sup>⊸</sup>E*,* <sup>∀</sup><sup>E</sup> [/]*,* <sup>⇒</sup> **some**(**person***, .***every**(**person***, .***call**() ())) : ⇒ **some**(**person***, .***every**(**person***, .***call**() ())) :

The final meaning language expressions in (54) and (55) give the two possible readings for the scope ambiguity, again assuming a standard model theory with generalized quantifiers. Once more, notice that neither quantifier moves in the syntax (again, contra QR analyses): they are respectively just a SUBJ and an OBJ in f-structure. And, once more, no special type shifting is necessary. It is a key strength of this approach that even quantifier scope ambiguity can be captured without positing *ad hoc* syntactic operations (and, again, without complicating the type of the object quantifier or the transitive verb). Again, this is ultimately due to the commutativity of the glue logic.

# **13 Conclusion**

HPSG and LFG are rather similar syntactic frameworks, both of them important declarative lexicalist alternatives to Transformational Grammar. They allow for

<sup>27</sup>We have made the typical move in Glue work of not showing the trivial universal elimination step this time.

#### Stephen Wechsler & Ash Asudeh

the expression of roughly the same set of substantive analyses, where analyses are individuated in terms of deeper theoretical content rather than superficial properties. The same sort of analytical options can be compared under both systems, answers to questions such as whether a phenomenon is to be captured on a lexical level or in the syntax, whether a given word string is a constituent or not, the proper treatment of complex predicates, and so on. Analyses in one framework can often be translated into the other, preserving the underlying intuition of the account.

Against the backdrop of a general expressive similarity, we have pointed out a few specific places where one framework makes certain modes of analysis available that are not found in the other. The main thesis of this chapter is that the differences between the frameworks stem from different design motivations, reflecting subtly different methodological outlooks. HPSG is historically rooted in context-free grammars and an interest in the study of locality. LFG is based on the notion of functional similarity or equivalence between what are externally rather different structures. For example, fixed phrasal positions, case markers, and agreement inflections can all function similarly in signaling grammatical relations. LFG makes this functional similarity highly explicit.

# **Abbreviations**

FV final vowel

# **Acknowledgments**

We would like to thank the editors of the volume, Anne Abeillé, Bob Borsley, Jean-Pierre Koenig, and Stefan Müller. JP and Stefan provided particularly close reads of the paper that greatly improved it. We would also like to thank the anonymous reviewers for their insightful comments. Any remaining errors are our own.

# **References**

Abeillé, Anne. 2021. Control and Raising. In Stefan Müller, Anne Abeillé, Robert D. Borsley & Jean- Pierre Koenig (eds.), *Head-Driven Phrase Structure Grammar: The handbook* (Empirically Oriented Theoretical Morphology and Syntax), 489–535. Berlin: Language Science Press. DOI: 10.5281/zenodo.5599840.

30 HPSG and Lexical Functional Grammar


#### Stephen Wechsler & Ash Asudeh


30 HPSG and Lexical Functional Grammar


Stephen Wechsler & Ash Asudeh


30 HPSG and Lexical Functional Grammar


Stephen Wechsler & Ash Asudeh

*the LFG '18 conference, University of Vienna*, 208–226. Stanford, CA: CSLI Publications. http://csli-publications.stanford.edu/LFG/LFG-2018/ (10 February, 2021).


30 HPSG and Lexical Functional Grammar

In Mary Dalrymple, Ronald M. Kaplan, John T. Maxwell III & Annie Zaenen (eds.), *Formal issues in Lexical-Functional Grammar* (CSLI Lecture Notes 47), 29–130. Stanford, CA: CSLI Publications, 1995.


Stephen Wechsler & Ash Asudeh


30 HPSG and Lexical Functional Grammar

*Conference on Head-Driven Phrase Structure Grammar, Center for Computational Linguistics, Katholieke Universiteit Leuven*, 423–443. Stanford, CA: CSLI Publications. http://csli-publications.stanford.edu/HPSG/2004/ (10 February, 2021).


Stephen Wechsler & Ash Asudeh


# **Chapter 31**

# **HPSG and Dependency Grammar**

# Richard Hudson

University College London

HPSG assumes Phrase Structure (PS), a partonomy, in contrast with Dependency Grammar (DG), which recognises Dependency Structure (DS), with direct relations between individual words and no multi-word phrases. The chapter presents a brief history of the two approaches, showing that DG matured in the late nineteenth century, long before the influential work by Tesnière, while Phrase Structure Grammar (PSG) started somewhat later with Bloomfield's enthusiastic adoption of Wundt's ideas. Since DG embraces almost as wide a range of approaches as PSG, the rest of the chapter focuses on one version of DG, Word Grammar. The chapter argues that classical DG needs to be enriched in ways that bring it closer to PSG: each dependent actually adds an extra node to the head, but the nodes thus created form a taxonomy, not a partonomy; coordination requires strings; and in some languages the syntactic analysis needs to indicate phrase boundaries. Another proposed extension to bare DG is a separate system of relations for controlling word order, which is reminiscent of the PSG distinction between dominance and precedence. The "head-driven" part of HPSG corresponds in Word Grammar to a taxonomy of dependencies which distinguishes grammatical functions, with complex combinations similar to HPSG's re-entrancy. The chapter reviews and rejects the evidence for headless phrases, and ends with the suggestion that HPSG could easily move from PS to DG.

# **1 Introduction**

HPSG is firmly embedded, both theoretically and historically, in the phrase-structure (PS) tradition of syntactic analysis, but it also has some interesting theoretical links to the dependency-structure (DS) tradition. This is the topic of the present chapter, so after a very simple comparison of PS and DS and a glance

Richard Hudson. 2021. HPSG and Dependency Grammar. In Stefan Müller, Anne Abeillé, Robert D. Borsley & Jean- Pierre Koenig (eds.), *Head-Driven Phrase Structure Grammar: The handbook*, 1447–1495. Berlin: Language Science Press. DOI: 10.5281/zenodo.5599880

### Richard Hudson

at the development of these two traditions in the history of syntax, I consider a number of issues where the traditions interact.

The basis for PS analysis is the part-whole relation between smaller units (including words) and larger phrases, so the most iconic notation uses boxes (Müller 2018: 6). In contrast, the basis for DS analysis is the asymmetrical dependency relation between two words, so in this case an iconic notation inserts arrows between words. (Although the standard notation in both traditions uses trees, these are less helpful because the lines are open to different interpretations.) The two analyses of a very simple sentence are juxtaposed in Figure 1. As in HPSG attribute-value matrices (AVMs), each rectangle represents a unit of analysis.

Figure 1: Phrase structure and dependency structure contrasted

In both approaches, each unit has properties such as a classification, a meaning, a form and relations to other items, but these properties may be thought of in two different ways. In PS analyses, an item contains its related items, so it also contains its other properties – hence the familiar AVMs contained within the box for each item. But in DS analyses, an item's related items are outside it, sitting alongside it in the analysis, so, for consistency, other properties may be shown as a network in which the item concerned is just one atomic node. This isn't the only possible notation, but it is the basis for the main DS theory that I shall juxtapose with HPSG, Word Grammar.

What, then, are the distinctive characteristics of the two traditions? In the following summary I use *item* to include any syntagmatic unit of analysis including morphemes, words and phrases (though this chapter will not discuss the possible role of morphemes). The following generalisations apply to classic examples of the two approaches: PS as defined by Chomsky in terms of labelled bracketed strings (Chomsky 1957), and DS as defined by Tesnière (1959; 2015). These generalisations refer to "direct relations", which are shown by single lines in standard tree notation; for example, taking a pair of words such as *big book*, they are related directly in DS, but only indirectly via a mother phrase in PS. A phenomenon such as agreement is not a relation in this sense, but it applies to

#### 31 HPSG and Dependency Grammar

word-pairs which are identified by their relationship; so even if two sisters agree, this does not in itself constitute a direct relation between them.


These generalisations imply important theoretical claims which can be tested; for instance, 2 claims that there are no discontinuous phrases, which is clearly false. On the other hand, 3 claims that there can be no exocentric or headless phrases, so DS has to consider apparent counter-examples such as the NPN construction, coordination and verbless sentences (see Sections 4.2 and 5.1 for discussion, and also Abeillé & Chaves 2021, Chapter 16 of this volume).

The contrasts in 1–3 apply without reservation to "plain vanilla" (Zwicky 1985) versions of DS and PS, but as we shall see in the history section, very few theories are plain vanilla. In particular, there are versions of HPSG that allow phrases to be discontinuous (Reape 1994; Kathol 2000; Müller 1995; 1996). Nevertheless, the fact is that HPSG evolved out of more or less pure PS, that it includes *phrase structure* in its name, and that it is never presented as a version of DS.

On the other hand, the term *head-driven* points immediately to dependency: an asymmetrical relation driven by a head word. Even if HPSG gives some constructions a headless analysis (Müller 2018: 654–666), the fact remains that it treats most constructions as headed. This chapter reviews the relations between HPSG and the very long DS tradition of grammatical analysis. The conclusion will be that in spite of its PS roots, HPSG implicitly (and sometimes even explicitly) recognises dependencies; and it may not be a coincidence that one of the main power-bases of HPSG is Germany, where the DS tradition is also at its strongest (Müller 2018: 359).

Where, then, does this discussion leave the notion of a phrase? In PS, phrases are basic units of the analysis, alongside words; but even DS recognises phrases indirectly because they are easily defined in terms of dependencies as a word

### Richard Hudson

plus all the words which depend, directly or indirectly, on it. Although phrases play no part in a DS analysis, it is sometimes useful to be able to refer to them informally (in much the same way that some PS grammars refer to grammatical functions informally while denying them any formal status).

Why, then, does HPSG use PS rather than DS? As far as I know, PS was simply default syntax in the circles where HPSG evolved, so the choice of PS isn't the result of a conscious decision by the founders, and I hope that this chapter will show that this is a serious question which deserves discussion.<sup>1</sup>

Unfortunately, the historical roots and the general dominance of PS have so far discouraged discussion of this fundamental question.

HPSG is a theoretical package where PS is linked intimately to a collection of other assumptions; and the same is true for any theory which includes DS, including my own Word Grammar (Hudson 1984; 1990; 1998; 2007; 2010; Gisborne 2010; 2020; Eppler 2010; Traugott & Trousdale 2013). Among the other assumptions of HPSG I find welcome similarities, not least the use of default inheritance in some versions of the theory. I shall argue below that inheritance offers a novel solution to one of the outstanding challenges for the dependency tradition.

The next section sets the historical scene. This is important because it's all too easy for students to get the impression (mentioned above) that PS is just default syntax, and maybe even the same as "traditional grammar". We shall see that grammar has a very long and rather complicated history in which the default is actually DS rather than PS. Later sections then address particular issues shared by HPSG and the dependency tradition.

# **2 Dependency and constituency in the history of syntax**

The relevant history of syntax starts more than two thousand years ago in Greece. (Indian syntax may have started even earlier, but it is hardly relevant because it

<sup>1</sup> Indeed, I once wrote a paper (which was never published) called "Taking the PS out of HPSG" – a title I was proud of until I noticed that PS was open to misreading, not least as "Pollard and Sag". Carl and Ivan took it well, and I think Carl may even have entertained the possibility that I might be right – possibly because he had previously espoused a theory called "Head Grammar" (HG). See also Flickinger, Pollard & Wasow (2021: Section 2.2), Chapter 2 of this volume on Head Grammar and the evolution of HPSG.

I hasten to add that while the PS view might have been the approach available at the time, there have been many researchers thinking carefully about issues concerning general phrase structure vs. dependency. For example, one general dependency structure is argued to be insufficient to account for complex predicates (Abeillé & Godard 2010; Godard & Samvelian 2021, Chapter 11 of this volume) and negation (Kim & Sag 2002; Kim 2021, Chapter 18 of this volume). See also Müller (2020: Section 11.7) for discussion of analyses of Eroms (2000), Groß & Osborne (2009), and others, and a general comparison of phrase structure and dependency approaches.

#### 31 HPSG and Dependency Grammar

had so little impact on the European tradition.) Greek and Roman grammarians focused on the morphosyntactic properties of individual words, but since these languages included a rich case system, they were aware of the syntactic effects of verbs and prepositions governing particular cases. However, this didn't lead them to think about syntactic relations, as such; precisely because of the case distinctions, they could easily distinguish a verb's dependents in terms of their cases: "its nominative", "its accusative" and so on (Robins 1967: 29). Both the selecting verb or preposition and the item carrying the case inflection were single words, so the Latin grammar of Priscian, written about 500 AD and still in use a thousand years later, recognised no units larger than the word: "his model of syntax was word-based – a dependency model rather than a constituency model" (Law 2003: 91). However, it was a dependency model without the notion of dependency as a relation between words.

The dependency relation, as such, seems to have been first identified by the Arabic grammarian Sibawayh in the eighth century (Owens 1988; Kouloughli 1999). However, it is hard to rule out the possibility of influence from the thenflourishing Paninian tradition in India, and in any case it doesn't seem to have had any more influence on the European tradition than did Panini's syntax, so it is probably irrelevant.

In Europe, grammar teaching in schools was based on parsing (in its original sense), an activity which was formalised in the ninth century (Luhtala 1994). The activity of parsing was a sophisticated test of grammatical understanding which earned the central place in school work that it held for centuries – in fact, right up to the 1950s (when I myself did parsing at school) and maybe beyond. In HPSG terms, school children learned a standard list of attributes for words of different classes, and in parsing a particular word in a sentence, their task was to provide the values for its attributes, including its grammatical function (which would explain its case). In the early centuries the language was Latin, but more recently it was the vernacular (in my case, English).

Alongside these purely grammatical analyses, the Ancient World had also recognised a logical one, due to Aristotle, in which the basic elements of a proposition (*logos*) are the logical subject (*onoma*) and the predicate (*rhēma*). For Aristotle a statement such as "Socrates ran" requires the recognition both of the person Socrates and of the property of running, neither of which could constitute a statement on its own (Law 2003: 30–31). By the twelfth century, grammarians started to apply a similar analysis to sentences; but in recognition of the difference between logic and grammar they replaced the logicians' *subiectum* and *praedicatum* by *suppositum* and *appositum* – though the logical terms would creep into grammar by the late eighteenth century (Law 2003: 168). This logical approach produced the first top-down analysis in which a larger unit (the logician's propo-

### Richard Hudson

sition or the grammarian's sentence) has parts, but the parts were still single words, so *onoma* and *rhēma* can now be translated as "noun" and "verb". If the noun or verb was accompanied by other words, the older dependency analysis applied.

The result of this confusion of grammar with logic was a muddled hybrid analysis in the Latin/Greek tradition which combines a headless subject-predicate analysis with a headed analysis elsewhere, and which persists even today in some school grammars; this confusion took centuries to sort out in grammatical theory. For the subject and verb, the prestige of Aristotle and logic supported a subject-verb division of the sentence (or clause) in which the subject noun and the verb were both equally essential – a very different analysis from modern first-order logic in which the subject is just one argument (among many) which depends on the predicate. Moreover the grammatical tradition even includes a surprising number of analyses in which the subject noun is the head of the construction, ranging from the modistic grammarians of the twelfth century (Robins 1967: 83), through Henry Sweet (Sweet 1891: 17), to no less a figure than Otto Jespersen in the twentieth (Jespersen 1937), who distinguished "junction" (dependency) from "nexus" (predication) and treated the noun in both constructions as "primary".

The first grammarians to recognise a consistently dependency-based analysis for the rest of the sentence (but not for the subject and verb) seem to have been the French *encyclopédistes* of the eighteenth century (Kahane 2020), and, by the nineteenth century, much of Europe accepted a theory of sentence structure based on dependencies, but with the subject-predicate analysis as an exception – an analysis which by modern standards is muddled and complicated. Each of these units was a single word, not a phrase, and modern phrases were recognised only indirectly by allowing the subject and predicate to be expanded by dependents; so nobody ever suggested there might be such a thing as a noun phrase until the late nineteenth century. Function words such as prepositions had no proper position, being treated typically as though they were case inflections.

The invention of syntactic diagrams in the nineteenth century made the inconsistency of the hybrid analysis obvious. The first such diagram was published in a German grammar of Latin for school children (Billroth 1832), and the nineteenth century saw a proliferation of diagramming systems, including the famous Reed-Kellogg diagrams which are still taught (under the simple name "diagramming") in some American schools (Reed & Kellogg 1877); indeed, there is a website which generates such diagrams, one of which is reproduced in Figure 2. <sup>2</sup> The significant

<sup>2</sup>See a small selection of diagramming systems at http://dickhudson.com/sentencediagramming/ (last access 2021-03-31), and the website Sentence Diagrammer by 1aiway.

#### 31 HPSG and Dependency Grammar

feature of this diagram is the special treatment given to the relation between the subject and predicate (with the verb *are* sitting uncomfortably between the two), with all the other words in the sentence linked by more or less straightforward dependencies. (The geometry of these diagrams also distinguishes grammatical functions.)

Figure 2: Reed-Kellogg diagram by Sentence Diagrammer

One particularly interesting (and relevant) fact about Reed and Kellogg is that they offer an analysis of *that old wooden house* in which each modifier creates a new unit to which the next modifier applies: *wooden house*, then *old wooden house* (Percival 1976: 18) – a clear hint at more modern structures (including the ones proposed in Section 4.1), albeit one that sits uncomfortably with plain-vanilla dependency structure.

However, even in the nineteenth century, there were grammarians who questioned the hybrid tradition which combined the subject-predicate distinction with dependencies. Rather remarkably, three different grammarians seem to have independently reached the same conclusion at roughly the same time: hybrid structures can be replaced by a homogeneous structure if we take the finite verb as the root of the whole sentence, with the subject as one of its dependents. This idea seems to have been first proposed in print in 1873 by the Hungarian Sámuel Brassai (Imrényi 2013; Imrényi & Vladár 2020); in 1877 by the Russian Aleksej Dmitrievsky (Sériot 2004); and in 1884 by the German Franz Kern (Kern 1884). Both Brassai and Kern used diagrams to present their analyses, and used precisely the same tree structures which Lucien Tesnière in France called *stemmas* nearly fifty years later (Tesnière 1959; 2015). The diagrams have both been redrawn here as Figures 3 and 4.

Brassai's proposal is contained in a school grammar of Latin, so the example is also from Latin – an extraordinarily complex sentence which certainly merits a diagram because the word order obscures the grammatical relations, which can be reconstructed only by paying attention to the morphosyntax. For example, *flentem* and *flens* both mean 'crying', but their distinct case marking links them to different nouns, so the nominative *flens* can modify nominative *uxor* (woman), while the accusative *flentem* defines a distinct individual glossed as 'the crying one'.

Richard Hudson

(1) Uxor wife.F.SG.NOM am-ans love-PTCP.F.SG.NOM fl-ent-em cry-PTCP-M.SG.ACC fl-ens cry-PTCP.F.SG.NOM acr-ius bitterly-more ips-a self-F.SG.NOM ten-eb-at, hug-PST-3SG imbr-e shower-M.SG.ABL per on in-dign-as un-becoming-F.PL.ACC usque continuously cad-ent-e fall-PTCP-M.SG.ABL gen-as. cheeks-F.PL.ACC (Latin) 'The wife, herself even more bitterly crying, was hugging the crying one, while a shower [of tears] was falling on her unbecoming cheeks [i.e. cheeks to which tears are unbecoming].'

Brassai's diagram, including grammatical functions as translated by the authors (Imrényi & Vladár 2020), is in Figure 3. The awkward horizontal braces should not be seen as a nod in the direction of classical PS, given that the bracketed words are not even adjacent in the sentence analysed. Kern's tree in Figure 4, on the other hand, is for the German sentence in (2).

31 HPSG and Dependency Grammar

(2) Ein-e a-F.SG.NOM stolz-e proud-F.SG.NOM Krähe crow(F).SG.NOM schmück-t-e decorate-PST-3SG sich self.ACC mit with d-en the-PL.DAT aus-ge-fall-en-en out-PTCP-fall-PTCP-PL.DAT Feder-n feather-PL.DAT d-er the-PL.GEN Pfau-en. peacock-PL.GEN (German) 'A proud crow decorated himself with the dropped feathers of the peacocks.'

Once again, the original diagram includes function terms which are translated in this diagram into English.

Figure 4: A verb-rooted tree from Kern (1884: 30)

Once again the analysis gives up on prepositions, treating *mit Federn* 'with feathers' as a single word, but Figure 4 is an impressive attempt at a coherent analysis which would have provided an excellent foundation for the explosion of syntax in the next century. According to the classic history of dependency grammar, in this approach,

[…] the sentence is not a basic grammatical unit, but merely results from combinations of words, and therefore […] the only truly basic grammatical unit is the word. A language, viewed from this perspective, is a collection

### Richard Hudson

of words and ways of using them in word-groups, i.e., expressions of varying length. (Percival 2007)

But the vagaries of intellectual history and geography worked against this intellectual breakthrough. When Leonard Bloomfield was looking for a theoretical basis for syntax, he could have built on what he had learned at school:

[…] we do not know and may never know what system of grammatical analysis Bloomfield was exposed to as a schoolboy, but it is clear that some of the basic conceptual and terminological ingredients of the system that he was to present in his 1914 and 1933 books were already in use in school grammars of English current in the United States in the nineteenth century. Above all, the notion of sentence "analysis", whether diagrammable or not, had been applied in those grammars. (Percival 2007)

And when he visited Germany in 1913–1914, he might have learned about Kern's ideas, which were already influential there. But instead, he adopted the syntax of the German psychologist Wilhelm Wundt. Wundt's theory applied to meaning rather than syntax, and was based on a single idea: that every idea consists of a subject and a predicate. For example, a phrase meaning "a sincerely thinking person" has two parts: *a person* and *thinks sincerely*; and the latter breaks down, regardless of the grammar, into the noun *thought* and *is sincere* (Percival 1976: 239).

For all its reliance on logic rather than grammar, the analysis is a clear precursor to neo-Bloomfieldian trees: it recognises a single consistent part-whole relationship (a partonomy) which applies recursively. This, then, is the beginning of the PS tradition: an analysis based purely on meaning as filtered through a speculative theory of cognition – an unpromising start for a theory of syntax. However, Bloomfield's school experience presumably explains why he combined Wundt's partonomies with the hybrid structures of Reed-Kellogg diagrams in his classification of structures as endocentric (headed) or exocentric (headless). For him, exocentric constructions include the subject-predicate structure and preposition phrases, both of which were problematic in sentence analysis at school. Consequently, his Immediate Constituent Analysis (ICA) perpetuated the old hybrid mixture of headed and headless structures.

The DS elements of ICA are important in evaluating the history of PS, because they contradict the standard view of history expressed here:

Within the Bloomfieldian tradition, there was a fair degree of consensus regarding the application of syntactic methods as well as about the analyses associated with different classes of constructions. Some of the gen-

#### 31 HPSG and Dependency Grammar

eral features of IC analyses find an obvious reflex in subsequent models of analysis. Foremost among these is the idea that structure involves a part– whole relation between elements and a larger superordinate unit, rather than an asymmetrical dependency relation between elements at the same level. (Blevins & Sag 2013: 202–203)

This quotation implies, wrongly, that ICA rejected DS altogether.

What is most noticeable about the story so far is that, even in the 1950s, we still haven't seen an example of pure phrase structure. Every theory visited so far has recognised dependency relations in at least some constructions. Even Bloomfieldian ICA had a place for dependencies, though it introduced the idea that dependents might be phrases rather than single words and it rejected the traditional grammatical functions such as subject and object. Reacting against the latter gap, and presumably remembering their schoolroom training, some linguists developed syntactic theories which were based on constituent structure but which did have a place for grammatical functions, though not for dependency as such. The most famous of these theories are Tagmemics (Pike 1954) and Systemic Functional Grammar (Halliday 1961; 1967). However, in spite of its very doubtful parentage and its very brief history, by the 1950s virtually every linguist in America seemed to accept without question the idea that syntactic structure was a partonomy.

This is the world in which Noam Chomsky introduced phrase structure, which he presented as a formalisation of ICA, arguing that "customarily, linguistic description on the syntactic level is formulated in terms of constituent analysis (parsing)" (Chomsky 1957: 26). But such analysis was only "customary" among the Bloomfieldians, and was certainly not part of the classroom activity of parsing (Matthews 1993: 147).

Chomsky's phrase structure continued the drive towards homogeneity which had led to most of the developments in syntactic theory since the early nineteenth century. Unfortunately, Chomsky dismissed both dependencies and grammatical functions as irrelevant clutter, leaving nothing but part-whole relations, category-labels, continuity and sequential order.

Rather remarkably, the theory of phrase structure implied the (psychologically implausible) claim that sideways relations such as dependencies between individual words are impossible in a syntactic tree – or at least that, even if they are psychologically possible, they can (and should) be ignored in a formal model. Less surprisingly, having defined PS in this way, Chomsky could easily prove that it was inadequate and needed to be greatly expanded beyond the plain-vanilla version. His solution was the introduction of transformations, but it was only thirteen years before he also recognised the need for some recognition of head-

### Richard Hudson

dependent asymmetries in X-bar theory (Chomsky 1970). At the same time, others had objected to transformations and started to develop other ways of making PS adequate. One idea was to include grammatical functions; this idea was developed variously in LFG (Bresnan 1978; 2001), Relational Grammar (Perlmutter & Postal 1983; Blake 1990) and Functional Grammar (Dik 1989; Siewierska 1991). Another way forward was to greatly enrich the categories (Harman 1963) as in GPSG (Gazdar et al. 1985) and HPSG (Pollard & Sag 1994).

Meanwhile, the European ideas about syntactic structure culminating in Kern's tree diagram developed rather more slowly. Lucien Tesnière in France wrote the first full theoretical discussion of DS in 1939, but it was not published till 1959 (Tesnière 1959; 2015), complete with stemmas looking like the diagrams produced seventy years earlier by Brassai and Kern. Somewhat later, these ideas were built into theoretical packages in which DS was bundled with various other assumptions about levels and abstractness. Here the leading players were from Eastern Europe, where DS flourished: the Russian Igor Mel'čuk (Mel'čuk 1988), who combined DS with multiple analytical levels, and the Czech linguists Petr Sgall, Eva Hajičová and Jarmila Panevova (Sgall et al. 1986), who included information structure. My own theory Word Grammar (developed, exceptionally, in the UK), also stems from the 1980s (Hudson 1984; 1990; Sugayama 2003; Hudson 2007; Gisborne 2008; Rosta 2008; Gisborne 2010; Hudson 2010; Gisborne 2011; Eppler 2010; Traugott & Trousdale 2013; Duran-Eppler et al. 2017; Hudson 2016; 2017; 2018; Gisborne 2019). This is the theory which I compare below with HPSG, but it is important to remember that other DS theories would give very different answers to some of the questions that I raise.

DS certainly has a low profile in theoretical linguistics, and especially so in anglophone countries, but there is an area of linguistics where its profile is much higher (and which is of particular interest to the HPSG community): naturallanguage processing (Kübler et al. 2009). For example:


<sup>3</sup>https://en.wikipedia.org/wiki/Treebank (last access 2021-04-06).

<sup>4</sup>https://universaldependencies.org/ (last access January 2021-04-06).

<sup>5</sup>https://books.google.com/ngrams/info and search for "dependency" (last access 2021-04-06). <sup>6</sup>https://nlp.stanford.edu/software/stanford-dependencies.shtml (last access 2021-04-06).

#### 31 HPSG and Dependency Grammar

The attraction of DS in NLP is that the only units of analysis are words, so at least these units are given in the raw data and the overall analysis can immediately be broken down into a much simpler analysis for each word. This is as true for a linguist building a treebank as it was for a school teacher teaching children to parse words in a grammar lesson. Of course, as we all know, the analysis actually demands a global view of the entire sentence, but at least in simple examples a bottom-up word-based view will also give the right result.

To summarise this historical survey, PS is a recent arrival, and is not yet a hundred years old. Previous syntacticians had never considered the possibility of basing syntactic analysis on a partonomy. Instead, it had seemed obvious that syntax was literally about how words (not phrases) combined with one another.

# **3 HPSG and Word Grammar**

The rest of this chapter considers a number of crucial issues that differentiate PS from DS by focusing specifically on how they distinguish two particular manifestations of these traditions, HPSG and Word Grammar (WG). The main question is, of course, how strong the evidence is for the PS basis of HPSG, and how easily this basis could be replaced by DS.

The comparison requires some understanding of WG, so what follows is a brief tutorial on the parts of the theory which will be relevant in the following discussion. Like HPSG, WG combines claims about syntactic relations with a number of other assumptions; but for WG, the main assumption is the Cognitive Principle:

(3) The Cognitive Principle:

Language uses the same general cognitive processes and resources as general cognition, and has access to all of them.

This principle is of course merely a hypothesis which may turn out to be wrong, but so far it seems correct (Müller 2018: 494), and it is more compatible with HPSG than with the innatist ideas underlying Chomskyan linguistics (Berwick, Friederici, Chomsky & Bolhuis 2013). In WG, it plays an important part because it determines other parts of the theory.

On the one hand, cognitive psychologists tend to see knowledge as a network of related concepts (Reisberg 2007: 252), so WG also assumes that the whole of language, including grammar, is a conceptual network (Hudson 1984: 1; 2007: 1). One of the consequences is that the AVMs of HPSG are presented instead as labelled network links; for example, we can compare the elementary example in (4) of the HPSG lexical item for a German noun (Müller 2018: 264) with an exact translation using WG notation.

#### Richard Hudson

HPSG regards AVMs as equivalent to networks, so translating this AVM into network notation is straightforward; however, it is visually complicated, so I take it in two steps. First I introduce the basic notation in Figure 5: a small triangle showing that the lexeme GRAMMATIK "isa" word, and a headed arrow representing a labelled attribute (here, "phonology") and pointing to its value. The names of entities and attributes are enclosed in rectangles and ellipses respectively.

Figure 5: The German noun *Grammatik* 'grammar' in a WG network

The rest of the AVM translates quite smoothly (ignoring the list for SPR), giving Figure 6, though an actual WG analysis would be rather different in ways that are irrelevant here.

The other difference based on cognitive psychology between HPSG and WG is that many cognitive psychologists argue that concepts are built around prototypes (Rosch 1973; Taylor 1995), clear cases with a periphery of exceptional cases. This claim implies the logic of default inheritance (Briscoe et al. 1993), which is popular in AI, though less so in logic. In HPSG, default inheritance is accepted by some (Lascarides & Copestake 1999) but not by others (Müller 2018: 403), whereas in WG it plays a fundamental role, as I show in Section 4.1 below. WG uses the isa relation to carry default inheritance, and avoids the problems of non-monotonic inheritance by restricting inheritance to node-creation (Hud-

#### 31 HPSG and Dependency Grammar

Figure 6: The German noun *Grammatik* 'grammar' in a WG network

son 2018: 18). Once again, the difference is highly relevant to the comparison of PS and DS because one of the basic questions is whether syntactic structures involve partonomies (based on whole:part relations) or taxonomies (based on the isa relation). (I argue in Section 4.1 that taxonomies exist within the structure of a sentence thanks to isa relations between tokens and sub-tokens.)

Default inheritance leads to an interesting comparison of the ways in which the two theories treat attributes. On the one hand, they both recognise a taxonomy in which some attributes are grouped together as similar; for example, the HPSG analysis in (4) classifies the attributes CATEGORY and CONTENT as LO-CAL, and within CATEGORY it distinguishes the HEAD and SPECIFIER attributes. In WG, attributes are called relations, and they too form a taxonomy. The sim-

### Richard Hudson

plest examples to present are the traditional grammatical functions, which are all subtypes of "dependent"; for example, "object" isa "complement", which isa "valent", which isa "dependent", as shown in Figure 7 (which begs a number of analytical questions such as the status of depictive predicatives, which are not complements).

Figure 7: A WG taxonomy of grammatical functions

In spite of the differences in the categories recognised, the formal similarity is striking. On the other hand, there is also an important formal difference in the roles played by these taxonomies. In spite of interesting work on default inheritance (Lascarides & Copestake 1999), most versions of HPSG allow generalisations but not exceptions ("If one formulates a restriction on a supertype, this automatically affects all of its subtypes"; Müller 2018: 275), whereas in WG the usual logic of default inheritance applies so exceptions are possible. These are easy to illustrate from word order, which (as explained in Section 4.4) is normally inherited from dependencies: a verb's subject normally precedes it, but an inverted subject (the subject of an inverted auxiliary verb, as in *did he*) follows it.

Another reason for discussing default inheritance and the isa relation is to explain that WG, just like HPSG, is a constraint-based theory. In HPSG, a sentence is grammatical if it can be modelled given the structures and lexicon provided by the grammar, which are combined with each other by inserting less complex structures into daughter slots of more complex structures. Similarly, in WG it is grammatical if its word tokens can all be inherited from entries in the grammar (which also includes the entire lexicon). Within the grammar, these may involve overrides, but overrides between the grammar and the word tokens imply some degree of ungrammaticality. For instance, *He slept* is grammatical because all the

#### 31 HPSG and Dependency Grammar

properties of *he* and *slept* (including their syntactic properties such as the word order that can be inherited from their grammatical function) can be inherited directly from the grammar, whereas \**Slept he* is ungrammatical in that the order of words is exceptional, and the exception is not licensed by the grammar.

This completes the tutorial on WG, so we are now ready to consider the issues that distinguish HPSG from this particular version of DS. In preparation for this discussion, I return to the three distinguishing assumptions about classical PS and DS theories given earlier as 1 to 3, and repeated here:


These distinctions will provide the structure for the discussion:

	- **–** semantic phrasing
	- **–** coordination
	- **–** phrasal edges
	- **–** word order
	- **–** structure sharing and raising/lowering
	- **–** headless phrases
	- **–** complex dependency
	- **–** grammatical functions

# **4 Containment and continuity (PS but not DS)**

# **4.1 Semantic phrasing**

One apparent benefit of PS is what I call "semantic phrasing" (Hudson 1990: 146– 151), in which the effect of adding a dependent to a word modifies that word's

### Richard Hudson

meaning to produce a different meaning. For instance, the phrase *typical French house* does not mean 'house which is both typical and French', but rather 'French house which is typical (of French houses)' (Dahl 1980: 486). In other words, even if the syntax does not need a node corresponding to the combination *French house*, the semantics does need one.

For HPSG, of course, this is not a problem, because every dependent is part of a new structure, semantic as well as syntactic (Müller 2019); so the syntactic phrase *French house* has a content which is 'French house'. But for DS theories, this is not generally possible, because there is no syntactic node other than those for individual words – so, in this example, one node for *house* and one for *French* but none for *French house*.

Fortunately for DS, there is a solution: create extra word nodes but treat them as a taxonomy, not a partonomy (Hudson 2018). To appreciate the significance of this distinction, the connection between the concepts "finger" and "hand" is a partonomy, but that between "index finger" and "finger" is a taxonomy; a finger is part of a hand, but it is not a hand, and conversely an index finger is a finger, but it is not part of a finger.

In this analysis, then, the token of *house* in *typical French house* would be factored into three distinct nodes:


(It is important to remember that the labels are merely hints to guide the analyst, and not part of the analysis; so the last label could have been *house+t+F* without changing the analysis at all. One of the consequences of a network approach is that the only substantive elements in the analysis are the links between nodes, rather than the labels on the nodes.) These three nodes can be justified as distinct categories because each combines a syntactic fact with a semantic one: for instance, *house* doesn't simply mean 'French house', but has that meaning because it has the dependent *French*. The alternative would be to add all the dependents and all the meanings to a single word node as in earlier versions of WG (Hudson 1990: 146–151), thereby removing all the explanatory connections; this seems much less plausible psychologically. The proposed WG analysis of *typical*

#### 31 HPSG and Dependency Grammar

*French house* is shown in Figure 8, with the syntactic structure on the left and the semantics on the right.

Figure 8: *typical French house* in WG

Unlike standard DG analyses (Müller 2019), the number of syntactic nodes in this analysis is the same as in an HPSG analysis, but crucially these nodes are linked by the isa relation, and not as parts to wholes – in other words, the hierarchy is a taxonomy, not a partonomy. As mentioned earlier, the logic is default inheritance, and the default semantics has isa links parallel to those in syntax; thus the meaning of *house+F* (*house* as modified by *French*) isa the meaning of *house* – in other words, a French house is a kind of house. But the default can be overridden by exceptions such as the meanings of adjectives like *fake* and *former*, so a fake diamond is not a diamond (though it looks like one) and a former soldier is no longer a soldier.<sup>7</sup> The exceptional semantics is licensed by the grammar – the stored network – so the sentence is fully grammatical. All this is possible because of the same default inheritance that allows irregular morphology and syntax.

# **4.2 Coordination**

Another potential argument for PS, and against DS, is based on coordination: coordination is a symmetrical relationship, not a dependency, and it coordinates phrases rather than single words. For instance, in (5) the coordination clearly links the VPs *came in* to *sat down* and puts them on equal grammatical terms; and it is this equality that allows them to share the subject *Mary*.

<sup>7</sup>See also Koenig & Richter (2021: Section 3.2), Chapter 22 of this volume on adjunct scope.

### Richard Hudson

(5) Mary came in and sat down.

But of course, in a classic DS analysis *Mary* is also attached directly to *came*, without an intervening VP node, so *came in* is not a complete syntactic item and this approach to coordination fails, so we have a prima facie case against DS. (For coordination in HPSG, see Abeillé & Chaves 2021, Chapter 16 of this volume.)

Fortunately, there is a solution: sets (Hudson 1990: 404–421). We know from the vast experimental literature (as well as from everyday experience) that the human mind is capable of representing ordered sets (strings) of words, so all we need to assume is that we can apply this ability in the case of coordination. The members of a set are all equal, so their relation is symmetrical; and the members may share properties (e.g. a person's children constitute a set united by their shared relation to that person as well as by a multitude of other shared properties). Moreover, sets may be combined into supersets, so both conjuncts such as *came in* and *sat down* and coordinations (*came in and sat down*) are lists. According to this analysis, then, the two lists (*came*, *in*) and (*sat*, *down*) are united by their shared subject, Mary, and combine into the coordination ((*came*, *in*) (*sat*, *down*)). The precise status of the conjunction *and* remains to be determined. The proposed analysis is shown in network notation in Figure 9.

Figure 9: Coordination with sets

Once again, inheritance plays a role in generating this diagram. The isa links have been omitted in Figure 9 to avoid clutter, but they are shown in Figure 10, where the extra isa links are compensated for by removing all irrelevant matter and the dependencies are numbered for convenience. In this diagram, the dependency d1 from *came* to *Mary* is the starting point, as it is established in

#### 31 HPSG and Dependency Grammar

processing during the processing of *Mary came* – long before the coordination is recognised; and the endpoint is the dependency d5 from *sat* to *Mary*, which is simply a copy of d1, so the two are linked by isa. (It will be recalled from Figure 7 that dependencies form a taxonomy, just like words and word classes, so isa links between dependencies are legitimate.) The conjunction *and* creates the three set nodes, and general rules for sets ensure that properties – in this case, dependencies – can be shared by the two conjuncts.

It's not yet clear exactly how this happens, but one possibility is displayed in the diagram: d1 licenses d2 which licenses d3 which licenses d4 which licenses d5. Each of these licensing relations is based on isa. Whatever the mechanism, the main idea is that the members of a set can share a property; for example, we can think of a group of people sitting in a room as a set whose members share the property of sitting in the room. Similarly, the set of strings *came in* and *sat down* share the property of having *Mary* as their subject.

Figure 10: Coordination with inherited dependencies

The proposed analysis may seem to have adopted phrases in all but name, but this is not so because the conjuncts have no grammatical classification, so coordination is not restricted to coordination of like categories. This is helpful with examples like (6) where an adjective is coordinated with an NP and a PP.

(6) Kim was intelligent, a good linguist and in the right job.

The possibility of coordinating mixed categories is a well-known challenge for PS-based analyses such as HPSG: "Ever since Sag et al. (1985), the underlying intuition was that what makes Coordination of Unlikes acceptable is that each

### Richard Hudson

conjunct is actually well-formed when combined individually with the shared rest" (Crysmann 2008: 61). Put somewhat more precisely, the intuition is that what coordinated items share is not their category but their function (Hudson 1990: 414). This is more accurate because simple combinability isn't enough; for instance, *we ate* can combine with an object or with an adjunct, but the functional difference prevents them from coordinating:


Similarly, *a linguist* can combine as dependent with many verbs, but these can only coordinate if their relation to *a linguist* is the same:


It is true that HPSG can accommodate the coordination of unlike categories by redefining categories so that they define functions rather than traditional categories; for example, if "predicative" is treated as a category, then the problem of (6) disappears because *intelligent*, *a good linguist* and *in the right job* all belong to the category "predicative". However, this solution generates as many problems as it solves. For example, why is the category "predicative" exactly equivalent to the function with the same name, whereas categories such as "noun phrase" have multiple functions? And how does this category fit into a hierarchy of categories so as to bring together an arbitrary collection of categories which are otherwise unrelated: nominative noun phrase, adjective phrase and preposition phrase?

Moreover, since the WG analysis is based on arbitrary strings and sets rather than phrases, it easily accommodates "incomplete" conjuncts (Hudson 1990: 405; Hudson 1982) precisely because there is no expectation that strings are complete phrases. This claim is born out by examples such as (13) (meaning '… and parties for foreign girls …').

(13) We hold parties for foreign *boys on Tuesdays* and *girls on Wednesdays*.

In this example, the first conjunct is the string (*boys*, *on*, *Tuesdays*), which is not a phrase defined by dependencies; the relevant phrases are *parties for foreign boys* and *on Tuesdays*.

31 HPSG and Dependency Grammar

This sketch of a WG treatment of coordination ignores a number of important issues (raised by reviewers) such as joint interpretation (14) and special choice of pronoun forms (15).


These issues have received detailed attention in WG (Hudson 1984: Chapter 5; 1988; 1990: Chapter 14; 1995; 2010: 175–181, 304–307), but they are peripheral to this chapter.

# **4.3 Phrasal edges**

One of the differences between PS and DS is that, at least in its classic form, PS formally recognises phrasal boundaries, and a PS tree can even be converted to a bracketed string where the phrase is represented by its boundaries. In contrast, although standard DS implies phrases (since a phrase can be defined as a word and all the words depending on it either directly or indirectly), it doesn't mark their boundaries.

This turns out to be problematic in dealing with Welsh soft mutation (Tallerman 2009). Tallerman's article is one of the few serious discussions by a PS advocate of the relative merits of PS and DS, so it deserves more consideration than space allows here. It discusses examples such as (16) and (17), where the emphasised words are morphologically changed by soft mutation in comparison with their underlying forms shown in brackets.

(16) Prynodd buy.PST.3S y the ddynes woman *delyn*. harp (telyn) (Welsh) 'The woman bought a harp.'

(17) Gwnaeth do.PST.3S y the ddynes woman [*werthu* sell.INF telyn]. harp (gwerthu) 'The woman sold a harp.'

Soft mutation is sensitive to syntax, so although 'harp' is the object of a preceding verb in both examples, it is mutated when this verb is finite (*prynodd*) and followed by a subject, but not when the verb has no subject because it is nonfinite (*werthu*). Similarly, the non-finite verb 'sell' is itself mutated in example (17) because it follows a subject, in contrast with the finite verbs which precede the subject and have no mutation.

### Richard Hudson

A standard PS explanation for such facts (and many more) is the "XP Trigger Hypothesis": that soft mutation is triggered on a subject or complement (but not an adjunct) immediately after an XP boundary (Borsley et al. 2007: 226). The analysis contains two claims: that mutation affects the first word of an XP, and that it is triggered by the end of another XP. The first claim seems beyond doubt: the mutated word is simply the first word, and not necessarily the head. Examples such as (18) are conclusive.

(18) Dw be.PRS.1S i I [*lawn* full mor as grac angry â as chi]. you (llawn) (Welsh) 'I'm just as angry as you.'

The second claim is less clearly correct; for instance, it relies on controversial assumptions about null subjects and traces in examples such as (19) and (20) (where *t* and *pro* stand for a trace and a null subject respectively, but have to be treated as full phrases for purposes of the XP Trigger Hypothesis in order to explain the mutation following them).

(19) Pwy who brynodd buy.PST.3S *t* delyn? harp (telyn) (Welsh) 'Who bought a harp?'

(20) Prynodd buy.PST.3S *pro* delyn. harp (telyn) 'He/she bought a harp.'

But suppose both claims were true. What would this imply for DS? All it shows is that we need to be able to identify the first word in a phrase (the mutated word) and the last word in a phrase (the trigger). This is certainly not possible in WG as it stands, but the basic premise of WG is that the whole of ordinary cognition is available to language, and it's very clear that ordinary cognition allows us to recognise beginnings and endings in other domains, so why not also in language? Moreover, beginnings and endings fit well in the framework of ideas about linearisation that are introduced in the next subsection.

The Welsh data, therefore, do not show that we need phrasal nodes complete with attributes and values. Rather, edge phenomena such as Welsh mutation show that DS needs to be expanded, but not that we need the full apparatus of PS. Exactly how to adapt WG is a matter for future research, not for this chapter.

#### 31 HPSG and Dependency Grammar

# **4.4 Word order**

In both WG and some variants of HPSG, dominance and linearity are separated, but this separation goes much further in WG. In basic HPSG, linearisation rules apply only to sisters, and if the binary branching often assumed for languages such as German (Müller 2018: Section 10.3) reduces these to just two, the result is clearly too rigid given the freedom of ordering found in many languages. It is true that solutions are available (Müller 2018: Chapter 10), such as allowing alternative binary branchings for the same word combinations (Müller 2021b: Section 3, Chapter 10 of this volume) or combining binary branching with flat structures held in lists, but these solutions involve extra complexity in other parts of the theory such as additional lists. For instance, one innovation is the idea of linearisation domains (Reape 1994; Kathol 2000; Müller 1996), which allow a verb and its arguments and adjuncts to be members of the same linearisation domain and hence to be realized in any order (Müller 2018: 302; Müller 2021b: Section 6, Chapter 10 of this volume). These proposals bring HPSG nearer to DS, where flat structures are inevitable and free order is the default (subject to extra order constraints).

WG takes the separation of linearity from dominance a step further by introducing two new syntactic relations dedicated to word order: "position" and "landmark", each of which points to a node in the overall network (Hudson 2018). As its name suggests, a word's landmark is the word from which it takes its position, and is normally the word on which it depends (as in the HPSG list of dependents); what holds phrases together by default is that dependents keep as close to their landmarks as possible, because a general principle bans intersecting landmark relations. Moreover, the word's "position" relative to its landmark may either be free or defined as either "before" or "after".

However, this default pattern allows exceptions, and because "position" and "landmark" are properties, they are subject to default inheritance which allows exceptions such as raising and extraction (discussed in Section 5.2). To give an idea of the flexibility allowed by these relations, I start with the very easy English example in Figure 11, where "lm" and "psn" stand for "landmark" and "position", and "<" and ">" mean "before" and "after".

It could be objected that this is a lot of formal machinery for such a simple matter as word order. However, it is important to recognise that the conventional left-right ordering of writing is just a written convention, and that a mental network (which is what we are trying to model in WG) has no left-right ordering. Ordering a series of objects (such as words) is a complex mental operation,

### Richard Hudson

Figure 11: Basic word order in English

which experimental subjects often get wrong, so complex machinery is appropriate. Moreover, any syntactician knows that language offers a multiplicity of complex relations between dependency structure and word order. To take an extreme example, non-configurational languages pose problems for standard versions of HPSG (for which Bender suggests solutions) as illustrated by a Wambaya sentence, repeated here as (21) (Bender 2008: 8; Nordlinger 1998):<sup>8</sup>

(21) Ngaragana-nguja grog-PROP.IV.ACC ngiy-a 3SG.NM.A-PST gujinganjanga-ni mother-II.ERG jiyawu give ngabulu milk.IV.ACC (Wambaya) '(His) mother gave (him) milk with grog in it.'

The literal gloss shows that both 'grog' and 'milk' are marked as accusative, which is enough to allow the former to modify the latter in spite of their separation. The word order is typical of many Australian non-configurational languages: totally free within the clause except that the auxiliary verb (glossed here as 3SG.PST) comes second (after one dependent word or phrase). Such freedom of order is easily accommodated if landmarks are independent of dependencies: the auxiliary verb is the root of the clause's dependency structure (as in English), and also the landmark for every word that depends on it, whether directly or (crucially) indirectly. Its second position is due to a rule which requires it to precede all these words by default, but to have just one "preceder". A simplified structure for this sentence (with Wambaya words replaced by English glosses) is shown in

<sup>8</sup>See also Müller (2021b: Section 7), Chapter 10 of this volume for a discussion of Bender's approach and Müller (2021b: Section 6.2), Chapter 10 of this volume for an analysis of the phenomenon in linearization-based HPSG.

#### 31 HPSG and Dependency Grammar

Figure 12, with dotted arrows below the words again showing landmark and position relations. The dashed horizontal line separates this sentence structure from the grammar that generates it. In words, an auxiliary verb requires precisely one preceder, which isa descendant. "Descendant" is a transitive generalisation of "dependent", so a descendant is either a dependent or a dependent of a descendant. The preceder precedes the auxiliary verb, but all other descendants follow it.

Figure 12: A non-configurational structure

### Richard Hudson

Later sections will discuss word order, and will reinforce the claims of this subsection: that plain-vanilla versions of either PS or DS are woefully inadequate and need to be supplemented in some way.

This completes the discussion of "containment" and "continuity", the characteristics of classical PS which are missing in DS. We have seen that the continuity guaranteed by PS is also provided by default in WG by a general ban on intersecting landmark relations; but, thanks to default inheritance, exceptions abound. HPSG offers a similar degree of flexibility but using different machinery such as word-order domains (Reape 1994); see also Müller (2021b), Chapter 10 of this volume. An approach to Wambaya not using linearisation domains but rather projection of valence information is discussed in Section 7 of Müller (2021b). Moreover, WG offers a great deal of flexibility in other relations: for example, a word may be part of a string (as in coordination) and its phrase's edges may need to be recognised structurally (as in Welsh mutation).

# **5 Asymmetry and functions**

This section considers the characteristics of DS which are missing from classical PS: asymmetrical relations between words and their dependents. Does syntactic theory need these notions? It's important to distinguish here between two different kinds of asymmetry that are recognised in HPSG. One is the kind which is inherent to PS and the part-whole relation, but the other is inherent to DS but an optional extra in PS: the functional asymmetry between the head and its dependents. HPSG, like most other theories of syntax, does recognise this asymmetry and indeed builds it into the name of the theory, but more recently this assumption has come under fire within the HPSG community for reasons considered below in Section 5.1.

But if the head/dependent distinction is important, are there any other functional distinctions between parts that ought to be explicit in the analysis? In other words, what about grammatical functions such as subject and object? As Figure 7 showed, WG recognises a taxonomy of grammatical functions which carry important information about word order (among other things), so functions are central to WG analyses. Many other versions of DS also recognise functional distinctions; for example, Tesnière distinguished actants from circumstantials, and among actants he distinguished subjects, direct objects and indirect objects (Tesnière 2015: xlvii). But the only functional distinction which is inherent to DS is the one between head and dependents, so other such distinctions are an optional extra in DS – just as they are in PS, where many theories accept them. But HPSG

31 HPSG and Dependency Grammar

leaves them implicit in the order of elements in ARG-ST (like phrases in DS), so this is an issue worth raising when comparing HPSG with the DS tradition.

# **5.1 Headless phrases**

Bloomfield assumed that phrases could be either headed (endocentric) or not (exocentric). According to WG (and other DS theories), there are no headless phrases. Admittedly, utterances may contain unstructured lists (e.g. *one two three four* …), and quotations may be unstructured strings, as in (22), but presumably no-one would be tempted to call such strings "phrases", or at least not in the sense of phrases that a grammar should generate.

(22) He said "One, two, three, testing, testing, testing."

Such strings can be handled by the mechanism already introduced for coordination, namely ordered sets.

The WG claim, then, is that when words hang together syntactically, they form phrases which always have a head. Is this claim tenable? There are a number of potential counterexamples including (23a)–(23d):

	- b. *The more you eat*, the fatter you get.<sup>10</sup>
	- c. In they came, *student after student*. 11
	- d. *However intelligent the students*, a lecture needs to be clear.<sup>12</sup>

All these examples can in fact be given a headed analysis, as I shall now explain, starting with (23a). *The rich* is allowed by *the*, which has a special subcase which allows a single adjective as its complement, meaning either "generic people" or some contextually defined notion (such as "apples" in *the red* used when discussing apples); this is not possible with any other determiner. In the determiner-headed analysis of standard WG, this is unproblematic as the head is *the*.

The comparative correlative in (23b) is clearly a combination of a subordinate clause followed by a main clause (Culicover & Jackendoff 1999), but what are the heads of the two clauses? The obvious dependency links the first *the* with

<sup>9</sup>Müller (2018: 403)

<sup>10</sup>Fillmore (1987: 164)

<sup>11</sup>Jackendoff (2008: 8)

<sup>12</sup>Adapted from Arnold & Borsley (2014: 28).

### Richard Hudson

the second (hence "correlative"), so it is at least worth considering an analysis in which this dependency is the basis of the construction and, once again, the head is *the*. Figure 13 outlines a possible analysis, though it should be noted that the dependency structures are complex. The next section discusses such complexities, which are a reaction to complex functional pressures; for example, it is easy to see that the fronting of *the less* reduces the distance between the two correlatives. Of course, there is no suggestion here that this analysis applies unchanged to every translation equivalent of our comparative correlative; for instance, French uses a coordinate structure without an equivalent of *the*: *Plus … et plus …* (Abeillé & Borsley 2008; Abeillé & Chaves 2021: Section 3.3, Chapter 16 of this volume).

Figure 13: A WG sketch of the comparative correlative

Example (23c) is offered by Jackendoff as a clear case of headlessness, but there is an equally obvious headed analysis of *student after student* in which the structure is the same as in commonplace NPN examples like *box of matches*. The only peculiarity of Jackendoff's example is the lexical repetition, which is beyond most theories of syntax. For WG, however, the solution is easy: the second N token isa the first, which allows default inheritance. This example illustrates an idiomatic but generalisable version of the NPN pattern in which the second N isa the first and the meaning is special; as expected, the pattern is recursive. The grammatical subnetwork needed to generate the syntactic structure for such examples is shown (with solid lines) in Figure 14; the semantics is harder and needs more research. What this diagram shows is that there is a subclass of nouns called here "nounnpn", which is special in having as its complement a preposition with the special property of having another copy of the same nounnpn as its complement. The whole construction is potentially recursive because the copy itself inherits the possibility of a preposition complement, but the recursion is limited by the fact that this complement is optional (shown as "0,1" inside the box, meaning that its quantity is either 0 (absent) or 1 (present)). Because the second noun isa the first, if it has a prepositional complement this is also a copy of the first preposition – hence *student after student after student*, whose structure is shown in Figure 14 with dashed lines.

#### 31 HPSG and Dependency Grammar

Figure 14: The NPN construction in Word Grammar

The "exhaustive conditional" or "unconditional" in (23d) clearly has two parts: *however smart* and *the students*, but which is the head? A verb could be added, giving *however smart the students are*, so if we assumed a covert verb, that would provide a head, but without a verb it is unclear – and indeed this is precisely the kind of subject-predicate structure that stood in the way of dependency analysis for nearly two thousand years.

However, there are good reasons for rejecting covert verbs in general. For instance, in Arabic a predicate adjective or nominal is in different cases according to whether "be" is overt: accusative when it is overt, nominative when it is covert. Moreover, the word order is different in the two constructions: the verb normally precedes the subject, but the verbless predicate follows it. In Arabic, therefore, a covert verb would simply complicate the analysis; but if an analysis without a covert verb is possible for Arabic, it is also possible in English.

Moreover, even English offers an easy alternative to the covert verb based on the structure where the verb BE is overt. It is reasonably uncontroversial to assume a raising analysis for examples such as (24a) and (24b), so (24c) invites a similar analysis (Müller 2009; 2012).

	- b. He is talking.
	- c. He is cold.

But a raising analysis implies a headed structure for *he ... cold* in which *he* depends (as subject) on *cold*. Given this analysis, the same must be true even where

### Richard Hudson

there is no verb, as in example (23d)'s *however smart the students* or so-called "Mad-Magazine sentences" like (25) (Lambrecht 1990).<sup>13</sup>

(25) What, him smart? You're joking!

Comfortingly, the facts of exhaustive conditionals support this analysis because the subject is optional, confirming that the predicate is the head:

(26) However smart, nobody succeeds without a lot of effort.

In short, where there is just a subject and a predicate, without a verb, then the predicate is the head.

Clearly it is impossible to prove the non-existence of headless phrases, but the examples considered have been offered as plausible examples, so if even they allow a well-motivated headed analysis, it seems reasonable to hypothesise that all phrases have heads.

# **5.2 Complex dependency**

The differences between HPSG and WG raise another question concerning the geometry of sentence structure, because the possibilities offered by the part-whole relations of HPSG are more limited than those offered by the word-word dependencies of WG. How complex can dependencies be? Is there a theoretical limit such that some geometrical patterns can be ruled out as impossible? Two particular questions arise:


The answer to both questions is yes for WG, but is less clear for HPSG. Consider the dependency structure for an example such as (27).

(27) I wonder who came.

<sup>13</sup>A reviewer asks what excludes alternatives such as \**He smart?* and \**Him smart.* (i.e. as a statement). The former is grammatically impossible because *he* is possible only as the subject of a tensed verb, but presumably the latter is excluded by the pragmatic constraints on the "Mad-magazine" construction.

#### 31 HPSG and Dependency Grammar

In a dependency analysis, the only available units are words, so the clause *who came* has no status in the analysis and is represented by its head. In WG, this is *who*, because this is the word that links *came* to the rest of the sentence.

Of interest in (27) are three dependencies:


Given the assumptions of DS, and of WG in particular, each of these dependencies is quite obvious and uncontroversial when considered in isolation. The problem, of course, is that they combine in an unexpectedly complicated way; in fact, this one example illustrates both the complex conditions defined above: *who* depends on two words which are not otherwise syntactically connected (*wonder* and *came*), and *who* and *came* are mutually dependent. A WG analysis of the relevant dependencies is sketched in Figure 15 (where "s" and "c" stand for "subject" and "complement").

Figure 15: Complex dependencies in a relative clause

A similar analysis applies to relative clauses. For instance, in (28), the relative pronoun *who* depends on the antecedent *man* as an adjunct and on *called* as its subject, while the "relative verb" *called* depends on *who* as its obligatory complement.

(28) I knew the man who called.

### Richard Hudson

Pied-piping presents well-known challenges. Take, for example, (29) (Pollard & Sag 1994: 212).

### (29) Here's the minister [[in [the middle [of [whose sermon]]]] the dog barked]

According to WG, *whose* (which as a determiner is head of the phrase *whose sermon*) is both an adjunct of its antecedent *minister* and also the head of the relative verb *barked*, just as in the simpler example. The challenge is to explain the word order: how can *whose* have dependency links to both *minister* and *barked* when it is surrounded, on both sides, by words on which it depends? Normally, this would be impossible, but pied-piping is special. The WG analysis (Hudson 2018) locates the peculiarities of pied-piping entirely in the word order, invoking a special relation "pipee" which transfers the expected positional properties of the relative pronoun (the "piper") up the dependency chain – in this case, to the preposition *in*.

And so we finish this review of complex dependencies by answering the question that exercised the minds of the Arabic grammarians in the Abbasid Caliphate: is mutual dependency possible? The arrow notation of WG allows grammars to generate the relevant structures, so the answer is yes, and HPSG can achieve the same effect by means of re-entrancy (see Pollard & Sag (1994: 50) for the mutual selection of determiner and noun); so this conclusion reflects another example of theoretical convergence.

# **5.3 Grammatical functions**

As I have already explained, more or less traditional grammatical functions such as subject and adjunct play a central part in WG, and more generally, they are highly compatible with any version of DS, because they are all sub-divisions of the basic function "dependent". This being so, we can define a taxonomy of functions such as the one in Figure 7, parts of which are developed in Figure 16 to accommodate an example of the very specific functions which are needed in any complete grammar: the second complement of *from*, as in *from London to Edinburgh*, which may be unique to this particular preposition.

HPSG also recognises a taxonomy of functions by means of three lists attached to any head word:


#### 31 HPSG and Dependency Grammar

Figure 16: A taxonomy of grammatical functions

### COMPS: its complements

ARG-ST: its specifier, its subject, and its complements, i.e. in WG terms, its valents.

The third list concatenates the first two, so the same analysis could be achieved in WG by a taxonomy in which SPR and COMPS both isa ARG-ST. However, there are also two important differences: in HPSG, adjuncts have a different status from other dependents, and these three general categories are lists.

Adjuncts are treated differently in the two theories. In WG, they are dependents, and located in the same taxonomy as valents; so in HPSG terms they would be listed among the head word's attributes, along with the other dependents but differentiated by not being licensed by the head. But HPSG reverses this relationship by treating the head as a MOD ("modified") of the adjunct. For example, in (30) *she* and *it* are listed in the ARG-ST of *ate* but *quickly* is not mentioned in the AVM of *ate*; instead, *ate* is listed as MOD of *quickly*.

(30) She ate it quickly.

This distinction, inherited from Categorial Grammar, correctly reflects the facts of government: *ate* governs*she* and *it*, but not *quickly*. It also reflects one possible analysis of the semantics, in which *she* and *it* provide built-in arguments of the predicate "eat", while *quickly* provides another predicate "quick", of which the whole proposition *eat*(*she*, *it*) is the argument. Other semantic analyses are of course possible, including one in which "manner" is an optional argument; but the proposed analysis is consistent with the assumptions of HPSG.

### Richard Hudson

On the other hand, HPSG also recognises a HEAD-DAUGHTER in schemata like the Specifier-Head, the Filler-Head, the Head-Complement and the Head-Adjunct Schema and in the construction which includes *quickly*, the latter is not the head. So what unifies arguments and adjuncts is the fact of not being heads (being members of the NON-HEAD-DTRS list in some versions of HPSG). In contrast, DS theories (including WG) agree in recognising adjuncts as dependents, so arguments and adjuncts are unified by this category, which is missing from most versions of HPSG, though not from all (Bouma, Malouf & Sag 2001). The DS analysis follows from the assumption that dependency isn't just about government, nor is it tied to a logical analysis based on predicates and arguments. At least in WG, the basic characteristic of a dependent is that it modifies the meaning of the head word, so that the resultant meaning is (typically) a hyponym of the head's unmodified meaning. Given this characterisation, adjuncts are core dependents; for instance *big book* is a hyponym of *book* (i.e. "big book" isa "book"), and *she ate it quickly* is a hyponym of *she ate it*. The same characterisation also applies to arguments: *ate it* is a hyponym of *ate*, and *she ate it* is a hyponym of *ate it*. (Admittedly hyponymy is merely the default, and as explained in Section 4.1 it may be overridden by the details of particular adjuncts such as *fake* as in *fake diamonds*; but exceptions are to be expected.)

Does the absence in HPSG of a unifying category "dependent" matter? So long as HEAD is available, we can express word-order generalisations for head-final and head-initial languages, and maybe also for "head-medial" languages such as English (Hudson 2010: 172). At least in these languages, adjuncts and arguments follow the same word-order rules, but although it is convenient to have a single cover term "dependent" for them, it is probably not essential. So maybe the presence of HEAD removes the need for its complement term, DEPENDENT.

The other difference between HPSG and WG lies in the way in which the finer distinctions among complements are made. In HPSG they are shown by the ordering of elements in a list, whereas WG distinguishes them as further subcategories in a taxonomy. For example, in HPSG the direct object is identified as the second NP in the ARG-ST list, but in WG it is a sub-category of "complement" in the taxonomy of Figure 16. In this case, each approach seems to offer something which is missing from the other.

On the one hand, the ordered lists of HPSG reflect the attractive ranking of dependents offered by Relational Grammar (Perlmutter & Postal 1983; Blake 1990) in which arguments are numbered from 1 to 3 and can be "promoted" or "demoted" on this scale. The scale had subjects at the top and remote adjuncts at the bottom, and appeared to explain a host of facts from the existence of argument-

#### 31 HPSG and Dependency Grammar

changing alternations such as passivisation (Levin 1993) to the relative accessibility of different dependents to relativisation (Keenan & Comrie 1977). An ordered list, as in ARG-ST, looks like a natural way to present this ranking of dependents.

On the other hand, the taxonomy of WG functions has the attraction of openendedness and flexibility, which seems to be in contrast with the HPSG analysis which assumes a fixed and universal list of dependency types defined by the order of elements in the various categories discussed previously (SPR, COMPS and ARG-ST). A universal list of categories seems to require an explanation: Why a universal list? Why this particular list? How does the list develop in a learner's mind? In contrast, a taxonomy can be learned entirely from experience, can vary across languages, and can accommodate any amount of minor variation. Of these three attractions, the easiest to illustrate briefly is the third. Take once again the English preposition *from*, as in (31).

### (31) From London to Edinburgh is four hundred miles.

Here *from* seems to have two complements: *London* and *to Edinburgh*. Since they have different properties, they must be distinguished, but how? The easiest and arguably correct solution is to create a special dependency type just for the second complement of *from*. This is clearly unproblematic in the flexible WG approach, where any number of special dependency types can be added at the foot of the taxonomy, but much harder if every complement must fit into a universal list. So, HPSG seems to have a problem here, but on closer inspection this is not the case: first, there is no claim that ARG-ST is universal. For example, Koenig & Michelson (2015) discuss Oneida (Iroquoian) and argue that this language does not have syntactic valence and hence it would not make sense to assume an ARG-ST list, which entails that ARG-ST is not universal. (See also Müller (2015) and Borsley & Müller 2021: Section 2.3, Chapter 28 of this volume on the non-assumption of innate language-specific knowledge in HPSG.) Keenan & Comrie (1977) discussed the obliqueness order as a universal tendency and it plays a role in various phenomena: relativization, case assignment, agreement, pronoun binding (see the chapters on these phenomena by Przepiórkowski 2021, Wechsler 2021, Müller 2021a) and an order is also needed for capturing generalizations on linking (Davis, Koenig & Wechsler 2021). But apart from this there is no label or specific category information attached to say the third element in the ARG-ST list. The general setting also allows for subjectless ARG-ST lists as needed in grammars of German. The respective lexemes would have an object at the first position of the ARG-ST list. English *from* is also unproblematic: the second element in an ARG-ST list can be anything. A respective specification can

### Richard Hudson

be lexeme specific or specific for a class of lexemes (see Chapters by Sailer (2021) on idioms and by Davis, Koenig & Wechsler (2021) on linking).

To summarise the discussion, therefore, HPSG and WG offer fundamentally different treatments of grammatical functions with two particularly salient differences. In the treatment of adjuncts, there are reasons for preferring the WG approach in which adjuncts and arguments are grouped together explicitly as dependents. But in distinguishing different types of complement, the HPSG lists seem to complement the taxonomy of WG, each approach offering different benefits. This is clearly an area needing further research.

# **6 HPSG without PS?**

This chapter on HPSG and DS raises a fundamental question for HPSG: does it really need PS? Most introductory textbooks present PS as an obvious and established approach to syntax, but it is only obvious because these books ignore the DS alternative: the relative pros and cons of the two approaches are rarely assessed. Even if PS is in fact better than DS, this can't be described as "established" (in the words of one of my reviewers) until its superiority has been demonstrated. This hasn't yet happened. The historical sketch showed very clearly that nearly two thousand years of syntactic theory assumed DS, not PS, with one exception: the subject-predicate analysis of the proposition (later taken to be the sentence). Even when PS was invented by Bloomfield, it was combined with elements of DS, and Chomsky's PS, purified of all DS elements, only survived from 1957 to 1970.

A reviewer also argues that HPSG is vindicated by the many large-scale grammars that use it (see also Bender & Emerson (2021: Section 3), Chapter 25 of this volume for an overview). These grammars are indeed impressive, but DS theories have also been implemented in the equally large-scale projects listed in Section 2. In any case, the question is not whether HPSG is a good theory, but rather whether it might be even better without its PS assumptions. The challenge for HPSG, then, is to explain why PS is a better basis than DS. The debate has hardly started, so its outcome is unpredictable; but suppose the debate favoured DS. Would that be the end of HPSG? Far from it. It could survive almost intact, with just two major changes.

The first would be in the treatment of grammatical functions. It would be easy to bring all dependents together in a list called DEPS (Bouma et al. 2001) with ADJUNCTS and COMPS as sub-lists, or even with a separate subcategory for each sub-type of dependent (Hellan 2017).

#### 31 HPSG and Dependency Grammar

The other change would be the replacement of phrasal boxes by a single list of words. (32) gives a list for the example with which we started (with round and curly brackets for ordered and unordered sets, and a number of sub-tokens for each word):

### (32) (*many*, *many*+*h students*, *students*+*a*, *enjoy*, *enjoy*+*o*, *enjoy*+*s*, *syntax*)

Each word in this list stands for a whole box of attributes which include syntactic dependency links to other words in the list. The internal structure of the boxes would otherwise look very much like standard HPSG, as in the schematic neo-HPSG structure in Figure 17. (To improve readability by minimizing crossing lines, attributes and their values are separated as usual by a colon, but may appear in either order.)

Figure 17: A neo-HPSG analysis

Figure 17 can be read as follows:

• The items at the bottom of the structure (*many*, *students*, *enjoy* and *syntax*) are basic types stored in the grammar, available for modification by the dependencies. These four words are the basis for the ordered set in (32), and shown here by the round brackets, with the ordering shown by the left-right dimension. This list replaces the ordered partonomy of HPSG.

### Richard Hudson


Roughly speaking, each boxed item in this diagram corresponds to an AVM in a standard HPSG analysis.

In short, modern HPSG could easily be transformed into a version of DS, with a separate AVM for each word. As in DS, the words in a sentence would be represented as an ordered list interrelated partly by the ordering and partly by the pairwise dependencies between them. This transformation is undeniably possible. Whether it is desirable remains to be established by a programme of research and debate which will leave the theory more robust and immune to challenge.

# **Abbreviations**

NM non-masc. (class II–IV) II noun class II IV noun class IV PROP proprietive

# **Acknowledgements**

I would like to take this opportunity to thank Stefan Müller for his unflagging insistence on getting everything right.

# **References**

Abeillé, Anne & Robert D. Borsley. 2008. Comparative correlatives and parameters. *Lingua* 118(8). 1139–1157. DOI: 10.1016/j.lingua.2008.02.001.

#### 31 HPSG and Dependency Grammar


### Richard Hudson

Morphology and Syntax), 1253–1329. Berlin: Language Science Press. DOI: 10. 5281/zenodo.5599874.


31 HPSG and Dependency Grammar


### Richard Hudson


31 HPSG and Dependency Grammar


### Richard Hudson


31 HPSG and Dependency Grammar


### Richard Hudson


31 HPSG and Dependency Grammar

(Topics in English Linguistics 57), 187–218. Berlin: Mouton de Gruyter. DOI: 10.1515/9783110199178.3.187.


# **Chapter 32**

# **HPSG and Construction Grammar**

# Stefan Müller

Humboldt-Universität zu Berlin

This chapter discusses the main tenets of Construction Grammar (CxG) and shows that HPSG adheres to them. The discussion includes surface orientation, language acquisition without UG, and inheritance networks and shows how HPSG (and other frameworks) are positioned along these dimensions. Formal variants of CxG will be briefly discussed and their relation to HPSG will be pointed out. It is argued that lexical representations of valence are more appropriate than phrasal approaches, which are assumed in most variants of CxG. Other areas of grammar seem to require headless phrasal constructions (e.g., the NPN construction and certain extraction constructions) and it is shown how HPSG handles these. Derivational morphology is discussed as a further example of an early constructionist analysis in HPSG.

This chapter deals with Construction Grammar (CxG) and its relation to Head-Driven Phrase Structure Grammar (HPSG). The short version of the message is: HPSG is a Construction Grammar.<sup>1</sup> It had constructional properties right from the beginning and over the years – due to influence by Construction Grammarians like Fillmore and Kay – certain aspects were adapted, making it possible to better capture generalizations over phrasal patterns. In what follows I will first say what Construction Grammars are (Section 1), and I will explain why HPSG as developed in Pollard & Sag (1987; 1994) was a Construction Grammar and how it was changed to become even more Constructive (Section 1.2.3). Section 2 deals

<sup>1</sup>This does not mean that HPSG is not a lot of other things at the same time. For instance, it is also a Generative Grammar in the sense of Chomsky (1965: 4), that is, it is explicit and formalized. HPSG is also very similar to Categorial Grammar (Müller 2013; Kubota 2021, Chapter 29 of this volume). Somewhat ironically, Head-Driven Phrase Structure Grammar is not entirely head-driven anymore (see Section 4.1), nor is it a phrase structure grammar (Richter 2021, Chapter 3 of this volume).

Stefan Müller. 2021. HPSG and Construction Grammar. In Stefan Müller, Anne Abeillé, Robert D. Borsley & Jean- Pierre Koenig (eds.), *Head-Driven Phrase Structure Grammar: The handbook*, 1497–1553. Berlin: Language Science Press. DOI: 10.5281/zenodo.5599882

### Stefan Müller

with so-called argument structure constructions, which are usually dealt with by assuming phrasal constructions in CxG, and explains why this is problematic and why lexical approaches are more appropriate. Section 3 explains Construction Morphology, Section 4 shows how cases that should be treated phrasally can be handled in HPSG, and Section 5 sums up the chapter.

# **1 What is Construction Grammar?**

Construction Grammar was developed as a theory that can account for nonregular phenomena as observed in many idioms (Fillmore, Kay & O'Connor 1988). It clearly set itself apart from theories like Government & Binding (Chomsky 1981), which assumes very abstract schemata for the combination of lexical items (X rules). The argument was that grammatical constructions are needed to capture irregular phenomena and their interaction with more regular ones. In contrast, Chomsky (1981: 7) considered rules for passive or relative clauses as epiphenomenal; everything was supposed to follow from general principles.<sup>2</sup> According to Chomsky, grammars consisted of a set of general combinatorial rules and some principles. The Minimalist Program (Chomsky 1995) is even more radical, since only two combinatorial rules are left (External and Internal Merge). Various forms of CxG object to this view and state that several very specific phrasal constructions are needed in order to account for language in its entirety and full complexity. Phenomena for which this is true will be discussed in Section 4. However, the case is not as clear in general, since one of the authors of Fillmore, Kay & O'Connor (1988) codeveloped a head-driven, lexical theory of idioms that is entirely compatible with the abstract rules of Minimalism (Sag 2007; Kay, Sag & Flickinger 2015; Kay & Michaelis 2017). This theory will be discussed in Section 1.3.2.1. Of course, the more recent lexical theory of idioms is a constructional theory as well. So the first question to answer in a chapter like this is: what is a construction in the sense of Construction Grammar? What is Construction Grammar? While it is relatively clear what a construction is, the answer to the question regarding Construction Grammar is less straight-forward (see also Fillmore 1988: 35 on this). Section 1.1 provides the definition for the term *construction*

<sup>2</sup>The passive in GB is assumed to follow from suppression of case assignment and the Case Filter, which triggers movement of the object to SpecIP. The important part of the analysis is the combination of the verb stem with the passive morphology. This is where suppression of case assignment takes place. This morphological part of the analysis corresponds to the Passive Construction in theories like HPSG and SBCG: a lexical rule (Pollard & Sag 1987: 215; Müller 2003a; Müller & Ørsnes 2013; Davis, Koenig & Wechsler 2021: Section 5.3, Chapter 9 of this volume). So in a sense there is a Passive Construction in GB as well.

32 HPSG and Construction Grammar

and Section 1.2 states the tenets of CxG and discusses to what extent the main frameworks currently on the market adhere to them.

# **1.1 What is a construction**

Fillmore, Kay & O'Connor (1988) discuss sentences like (1) and notice that they pose puzzles for standard accounts of syntax and the syntax/semantics interface.

	- b. I wouldn't pay five dollars for it, let alone ten dollars.

The *the* -*er the* -*er* Construction is remarkable, since it combines aspects of normal syntax (clause structure and extraction) with idiosyncratic aspects like the special use of *the*. In (1a) the adverb phrase *more carefully* does not appear to the left of *work* but is fronted and *the* appears without a noun. The second clause in (1a) is structured in a parallel way. There have to be two of these *the* clauses to form the respective construction. Fillmore, Kay & O'Connor (1988) extensively discuss the properties of *let alone*, which are interesting for syntactic reasons (the fragments following *let alone*) and for semantic and information structural reasons. I will not repeat the discussion here but refer the reader to the paper.<sup>3</sup>

In later papers, examples like (2) were discussed:


Again, the semantics of the complete sentences is not in an obvious relation to the material involved. The question in (2a) is not about a scratch's actions, but rather the question is why there is a scratch. Similarly, (2b) is special in that there is a directional PP that does not normally go together with verbs like *dug*. It is licensed by *way* in combination with a possessive pronoun.

Fillmore et al. (1988), Goldberg (1995), Kay & Fillmore (1999) and Construction Grammarians in general argue that the notion of "construction" is needed for adequate models of grammar, that is, for models of grammar that are capable of analyzing the examples above. Fillmore et al. (1988: 501) define *construction* as follows:

Constructions on our view are much like the nuclear family (mother plus daughters) subtrees admitted by phrase structure rules, EXCEPT that (1) constructions need not be limited to a mother and her daughters, but may span

<sup>3</sup>For an analysis of comparative correlative constructions as in (1a) in HPSG, see Abeillé & Chaves (2021: Section 3.3), Chapter 16 of this volume and the papers cited there.

### Stefan Müller

wider ranges of the sentential tree; (2) constructions may specify, not only syntactic, but also lexical, semantic, and pragmatic information; (3) lexical items, being mentionable in syntactic constructions, may be viewed, in many cases at least, as constructions themselves; and (4) constructions may be idiomatic in the sense that a large construction may specify a semantics (and/or pragmatics) that is distinct from what might be calculated from the associated semantics of the set of smaller constructions that could be used to build the same morphosyntactic object. (Fillmore et al. 1988: 501)

A similar definition can be found in Goldberg's work. Goldberg (2006: 5) defines construction as follows:

Any linguistic pattern is recognized as a construction as long as some aspect of its form or function is not strictly predictable from its component parts or from other constructions recognized to exist. In addition, patterns are stored as constructions even if they are fully predictable as long as they occur with sufficient frequency. (Goldberg 2006: 5)

The difference between this definition and earlier definitions by her and others is that patterns that are stored because of their frequencies are included. This addition is motivated by psycholinguistic findings that show that forms may be stored even though they are fully regular and predictable (Bybee 1995; Pinker & Jackendoff 2005: 228).

Goldberg provides Table 32.1 as examples of constructions. In addition to such constructions with a clear syntax-semantics or syntax-function relation, Gold-



#### 32 HPSG and Construction Grammar

berg (2013: 453) assumes a rather abstract VP construction specifying "statistical constraints on the ordering of postverbal complements, dependent on weight and information structure".

If one just looks at Goldberg's definition of construction, all theories currently on the market could be regarded as Construction Grammars. As Peter Staudacher pointed out in the discussion after a talk by Knud Lambrecht in May 2006 in Potsdam, lexical items are form-meaning pairs and the rules of phrase structure grammars come with specific semantic components as well, even if it is just functional application. So, Categorial Grammar, GB-style theories paired with semantics (Heim & Kratzer 1998), GPSG, TAG, LFG, HPSG, and even Minimalism would be Construction Grammars. If one looks at the examples of constructions in Table 32.1, things change a bit. Idioms are generally not the focus of work in Mainstream Generative Grammar (MGG).<sup>4</sup> MGG is usually concerned with explorations of the so-called Core Grammar as opposed to the Periphery, to which the idioms are assigned. The Core Grammar is the part of the grammar that is supposed to be acquired with help of innate domain specific knowledge, something whose existence Construction Grammar denies. But if one takes Hauser, Chomsky & Fitch (2002) seriously and assumes that only the ability to form complex linguistic objects out of less complex linguistic objects (Merge) is part of this innate knowledge, then the core/periphery distinction does not have much content and after all, Minimalists could adopt a version of Sag's local, selection-based analysis of idioms (Sag 2007; Kay, Sag & Flickinger 2015; Kay & Michaelis 2017) and in fact, some did: Everaert (2010) and G. Müller (2011: 21).<sup>5</sup> However, as is discussed in the next subsection, there are other aspects that really set Construction Grammar apart from MGG.

# **1.2 Basic tenets of Construction Grammar**

Goldberg (2003: 219) names the following tenets as core assumptions standardly made in CxG:

*Tenet 1* All levels of description are understood to involve pairings of form with semantic or discourse function, including morphemes or words, idioms, partially lexically filled and fully abstract phrasal patterns. (See Table 32.1.)

<sup>4</sup>The term *Mainstream Generative Grammar* is used to refer to work in Transformational Grammar, for example Government & Binding (Chomsky 1981) and Minimalism (Chomsky 1995). Some authors working in Construction Grammar see themselves in the tradition of Generative Grammar in a wider sense, see for example Fillmore, Kay & O'Connor (1988: 501) and Fillmore (1988: 36).

<sup>5</sup>See also Sailer (2021: Section 4.4), Chapter 17 of this volume on lexical approaches to idioms.

### Stefan Müller


I already commented on Tenet 1 above. Tenet 2 concerns semantics and the syntax-semantics interface, which are part of most HPSG analyses. In what follows I want to look in more detail at the other tenets. Something that is not mentioned in Goldberg's tenets but is part of the definition of construction by Fillmore et al. (1988: 501) is the non-locality of constructions. I will comment on this in a separate subsection.

# **1.2.1 Surface orientation and empty elements**

Tenet 3 requires a surface-oriented approach. Underlying levels and phonologically empty elements are ruled out. This excludes derivational models of transformational syntax assuming an underlying structure (the so-called D-structure) and some derived structure or more recent derivational variants of Minimalism. There was a time where representational models of Government & Binding (GB, Chomsky 1981) did not assume a D-structure but just one structure with traces (Koster 1978: 1987: 235; Kolb & Thiersch 1991; Haider 1993: Section 1.4; Frey 1993: 14; Lohnstein 1993: 87–88, 177–178; Fordham & Crocker 1994: 38; Veenstra 1998: 58). Some of these analyses are rather similar to HPSG analyses as they are assumed today (Kiss 1995; Bouma & van Noord 1998; Meurers 2000; Müller

#### 32 HPSG and Construction Grammar

2005; 2021b,c). Chomsky's Minimalist work (Chomsky 1995) assumes a derivational model and comes with a rhetoric of building structure in a bottom-up way and sending complete Phases to the interfaces for pronunciation and interpretation. This is incompatible with Tenet 3, but in principle, Minimalist approaches are very similar to Categorial Grammar, so there could be representational approaches adhering to Tenet 3.<sup>6</sup>

A comment on empty elements is in order: all articles introducing Construction Grammar state that CxG does not assume empty elements. Most of the alternative theories do use empty elements: see König (1999) on Categorial Grammar, Gazdar, Klein, Pullum & Sag (1985: 143) on GPSG, Bresnan (2001: 67) on LFG, Bender (2001) and Sag, Wasow & Bender (2003: 464) on HPSG/Sign-Based Construction Grammar. There are results from the 60s that show that phrase structure grammars containing empty elements can be translated into grammars that do not contain empty elements (Bar-Hillel, Perles & Shamir 1961: 153, Lemma 4.1) and sure enough there are versions of GPSG (Uszkoreit 1987: 76–77), LFG (Kaplan & Zaenen 1989; Dalrymple et al. 2001), and HPSG (Bouma et al. 2001; Sag 2010: 508) that do not use empty elements. Grammars with empty elements often are more compact than those without empty elements and express generalizations more directly. See for example Bender (2001) for copulaless sentences in African American Vernacular English and Müller (2004) on nounless NPs in German. The argument against empty elements usually refers to language acquisition: it is argued that empty elements cannot be learned since they are not detectable in the input. However, if the empty elements alternate with visible material, it can be argued that what is learned is the fact that a certain element can be left out. What is true, though, is that things like empty expletives cannot be learned since these empty elements are neither visible nor do they contribute to meaning. Their only purpose in grammars is to keep uniformity. For example, Grewendorf (1995) working in GB suggests an analysis of the passive in German that is parallel to the movement-based analysis of English passives (Chomsky 1981: 124). In order to account for the fact that the subject does not move to initial position in German, he suggests an empty expletive pronoun that takes the

<sup>6</sup>There is a variant of Minimalist Grammars (Stabler 2011), namely Top-down Phase-based Minimalist Grammar (TPMG) as developed by Chesi (2004; 2007) and Bianchi & Chesi (2006; 2012). There is no movement in TPMG. Rather, *wh*-phrases are linked to their "in situ" positions with the aid of a short-term memory buffer that functions like a stack. See also Hunter (2010; 2019) for a related account where the information about the presence of a *wh*-phrase is percolated in the syntax tree, like in GPSG/HPSG. For a general comparison of Minimalist grammars and HPSG, see Müller (2013: Section 2.3) and Müller (2020: 177–180), which includes the discussion of a more recent variant suggested by Torr (2019).

### Stefan Müller

subject position and that is connected to the original non-moved subject. Such elements cannot be acquired without innate knowledge about the IP/VP system and constraints about the obligatory presence of subjects. The CxG criticism is justified here.

A frequent argumentation for empty elements in MGG is based on the fact that there are overt realizations of an element in other languages (e.g., object agreement in Basque and focus markers in Gungbe). But since there is no languageinternal evidence for these empty elements, they cannot be learned and one would have to assume that they are innate. This kind of empty element is rightly rejected (by proponents of CxG and others).

Summing up, it can be said that all grammars can be turned into grammars without empty elements and hence fulfill Tenet 3. It was argued that the reason for assuming Tenet 3 (problems in language acquisition) should be reconsidered and that a weaker form of Tenet 3 should be assumed: empty elements are forbidden unless there is language-internal evidence for them. This revised version of Tenet 3 would allow one to count empty element versions of CG, GPSG, LFG, and HPSG among Construction Grammars.

### **1.2.2 Language acquisition without the assumption of UG**

Tenets 4 and 5 are basically what everybody should assume in MGG if Hauser, Chomsky & Fitch (2002) are taken seriously. Of course, this is not what is done in large parts of the field. The most extreme variant is Cinque & Rizzi (2010), who assume at least 400 functional heads (p. 57) being part of Universal Grammar (UG) and being present in the grammars of all languages, although sometimes invisibly (p. 55). Such assumptions beg the question why the genera of Bantu languages should be part of our genome and how they got there. Researchers working on language acquisition realized that the Principles & Parameters approach (Meisel 1995) makes wrong predictions. They now talk about Micro-Cues instead of parameters (Westergaard 2014) and these Micro-Cues are just features that can be learned. However, Westergaard still assumes that the features are determined by UG, a dubious assumption seen from a CxG perspective (and from the perspective of Hauser, Chomsky & Fitch and genetics in general; Bishop 2002).

Note that even those versions of Minimalism that do not follow the Rizzi-style Cartographic approaches are far from being minimalist in their assumptions. Some distinguish between strong and weak features, some assume enumerations of lexical items from which a particular derivation draws its input, and some assume that all movement has to be feature-driven. Still others assume that deriva-

#### 32 HPSG and Construction Grammar

tions work in so-called Phases and that a Phase, once completed, is "shipped to the interfaces". Construction of Phases is bottom-up, which is incompatible with psycholinguistic results (see also Borsley & Müller 2021: Section 5.1, Chapter 28 in this volume). None of these assumptions is a natural one to make from a language acquisition point of view. Most of these assumptions do not have any empirical motivation; the only motivation usually given is that they result in "restrictive theories". But if there is no motivation for them, this means that the respective architectural assumptions have to be part of our innate domainspecific knowledge, which is implausible according to Hauser, Chomsky & Fitch (2002).

As research in computational linguistics shows, our input is rich enough to form classes, to determine the part of speech of lexical items, and even to infer syntactic structure thought to be underdetermined by the input. For instance, Bod (2009) shows that the classical auxiliary inversion examples that Chomsky still uses in his Poverty of the Stimulus arguments (Chomsky 1971: 29–33; Berwick, Pietroski, Yankama & Chomsky 2011) can also be learned from language input available to children. See also Freudenthal et al. (2006; 2007) on input-based language acquisition.

HPSG does not make any assumptions about complicated mechanisms like feature-driven movement and so on. HPSG states properties of linguistic objects like part of speech, case, gender, etc., and states relations between features like agreement and government. In this respect it is like other Construction Grammars and hence experimental results regarding theories of language acquisition can be carried over to HPSG. See also Borsley & Müller (2021: Section 5.2), Chapter 28 of this volume on language acquisition.

### **1.2.3 Inheritance networks**

This leaves us with Tenets 6 and 7, that is, *inheritance networks* and the constructicon. Inheritance is something that is used in the classification of knowledge. For example, the word *animal* is very general and refers to entities with certain properties. There are subtypes of this kind of entity: *mammal* and further subtypes like *mouse*. In inheritance hierarchies, the knowledge of superconcepts is not restated at the subconcepts but instead, the superconcept is referred to. This is like Wikipedia: the Wikipedia entry of *mouse* states that mice are mammals without listing all the information that comes with the concept of mammal. Such inheritance hierarchies can be used in linguistics as well. They can be used to classify roots, words, and phrases. An example of such a hierarchy used for the classification of adjectives and adjectival derivation is discussed in Section 3. See

### Stefan Müller

also Davis & Koenig (2021: Section 4), Chapter 4 of this volume on inheritance in the lexicon.

MGG does not make reference to inheritance hierarchies. HPSG did this right from the beginning in 1985 (Flickinger, Pollard & Wasow 1985) for lexical items and since 1995 also for phrasal constructions (Sag 1997). LFG rejected the use of types, but used macros in computer implementations. The macros were abbreviatory devices specific to the implementation and did not have any theoretical importance. This changed in 2004, when macros were suggested in theoretical work (Dalrymple, Kaplan & King 2004). And although any connection to constructionist work is vehemently denied by some of the authors, recent work in LFG has a decidedly constructional flavor (Asudeh, Dalrymple & Toivonen 2008; 2014).<sup>7</sup> LFG differs from frameworks like HPSG, though, in assuming a separate level of c-structure. c-structure rules are basically context-free phrase structure rules, and they are not modeled by feature value pairs (but there is a model-theoretic formalization; Kaplan 1995: 12). This means that it is not possible to capture a generalization regarding lexical items, lexical rules, and phrasal schemata, or any two-element subset of these three kinds of objects. While HPSG describes all of these elements with the same inventory and hence can use common supertypes in the description of all three, this is not possible in LFG (Müller 2018b: Section 23.1).<sup>8</sup> For example, Höhle (1997) argued that complementizers and finite verbs in initial position in German form a natural class. HPSG can capture this since complementizers (lexical elements) and finite verbs in initial position (results of lexical rule applications or a phrasal schema, see Müller 2021a: Section 5.1, Chapter 10 of this volume) can have a common supertype. TAG is also using inheritance in the Meta Grammar (Lichte & Kallmeyer 2017).

Since HPSG's lexical entries, lexical rules, and phrasal schemata are all described by typed feature descriptions, one could call the set of these descriptions the constructicon. Therefore, Tenet 7 is also adhered to.

### **1.2.4 Non-locality**

Fillmore, Kay & O'Connor (1988: 501) stated in their definition of constructions that constructions may involve more than mothers and immediate daughters

<sup>7</sup>See Toivonen (2013: 516) for an explicit reference to construction-specific phrase structure rules in the sense of Construction Grammar. See Müller (2018a) for a discussion of phrasal LFG approaches.

<sup>8</sup>One could use templates (Dalrymple et al. 2004; Asudeh et al. 2013) to specify properties of lexical items and of mother nodes in c-structure rules, but usually c-structure rules specify the syntactic categories of mothers and daughters, so this information has a special status within the c-structure rules.

#### 32 HPSG and Construction Grammar

(see p.1499 below).<sup>9</sup> That is, daughters of daughters can be specified as well. A straightforward example of such a specification is given in Figure 1, which shows the TAG analysis of the idiom *take into account* followingAbeillé & Schabes (1989: 7). The fixed parts of the idiom are just stated in the tree. NP↓ stands for an open

Figure 1: TAG tree for *take into account* by Abeillé & Schabes (1989: 7)

slot into which an NP has to be inserted. The subscript NA says that adjunction to the respectively marked nodes is forbidden. Theories like Constructional HPSG can state such complex tree structures like TAG can. Dominance relationships are modeled by feature structures in HPSG and it is possible to have a description that corresponds to Figure 1. The NP slots would just be left underspecified and can be filled in models that are total (see Richter 2007 and Richter 2021, Chapter 3 of this volume for formal foundations of HPSG).

It does not come without some irony that the theoretical approach that was developed out of Berkeley Construction Grammar and Constructional HPSG, namely Sign-Based Construction Grammar (Sag, Boas & Kay 2012; Sag 2012), is strongly local: it is made rather difficult to access daughters of daughters (Sag 2007). So, if one would stick to the early definition, this would rule out SBCG as a Construction Grammar. Fortunately, this is not justified. First, there are ways to establish nonlocal selection (see Section 1.3.2.1) and second, there are ways to analyze idioms locally. Sag (2007), Kay, Sag & Flickinger (2015), and Kay & Michaelis (2017) develop a theory of idioms that is entirely based on local se-

<sup>9</sup>This subsection is based on a much more thorough discussion of locality and SBCG in Müller (2016: Section 10.6.2.1.1 and Section 18.2).

#### Stefan Müller

lection.<sup>10</sup> For example, for *take into account*, one can state that *take* selects two NPs and a PP with the fixed lexical material *into* and *account*. The right form of the PP is enforced by means of the feature LEXICAL IDENTIFIER (LID). A special word *into* with the LID value *into* is specified as selecting a special word *account*. What is done in TAG via direct specification is done in SBCG via a series of local selections of specialized lexical items. The interesting (intermediate) conclusion is: if SBCG can account for idioms via local selection, then theories like Categorial Grammar and Minimalism can do so as well. So, they cannot be excluded from Construction Grammars on the basis of arguments concerning idioms and non-locality of selection.

However, there may be cases of idioms that cannot be handled via local selection. For example, Richter & Sailer (2009) discuss the following idiom:

(3) glauben, believe X\_Acc X tritt kicks ein a Pferd horse 'be utterly surprised'

The X-constituent has to be a pronoun that refers to the subject of the matrix clause. If this is not the case, the sentence becomes ungrammatical or loses its idiomatic meaning.

	- b. Jonas Jonas glaubt, believes ihn him tritt kicks ein a Pferd.<sup>11</sup> horse 'Jonas is utterly surprised.'
	- c. # Jonas Jonas glaubt, believes dich you tritt kicks ein a Pferd. horse 'Jonas believes that a horse kicks you.'

Richter & Sailer (2009: 313) argue that the idiomatic reading is only available if the accusative pronoun is fronted and the embedded clause is V2. The examples in (5) do not have the idiomatic reading:

	- I believe that me a horse kicks
	- 'I believe that a horse kicks me.'

<sup>10</sup>Of course this theory is also compatible with any other variant of HPSG. As Flickinger, Pollard & Wasow (2021: 69), Chapter 2 of this volume point out, it was part of the grammar fragment that has been developed at the CSLI by Dan Flickinger (Flickinger, Copestake & Sag 2000; Flickinger 2000; 2011) years before the SBCG manifesto was published.

<sup>11</sup>http://www.machandel-verlag.de/readerview/der-katzenschatz.html, 2021-01-29.

32 HPSG and Construction Grammar

b. Ich I glaube, believe ein a Pferd horse tritt kicks mich. me 'I believe that a horse kicks me.'

They develop an analysis with a partly fixed configuration and some open slots, similar in spirit to the TAG analysis in Figure 1. However, their restrictions on *Pferd* clauses are too strict since there are variants of the idiom that do not have the accusative pronoun in the *Vorfeld*:

(6) ich I glaub believe es EXPL tritt kicks mich me ein a Pferd horse wenn when ich I einen a derartigen such Unsinn nonsense lese.<sup>12</sup> read 'I am utterly surprised when I read such nonsense.'

So it might be the case that the organization of the embedded clause can be stated clause-internally, and hence it is an open question whether there are idioms that make nonlocal Constructions necessary.

What is not an open empirical question, though, is whether humans store chunks with complex internal structure or not. It is clear that we do, and much Construction Grammar literature emphasizes this. Constructional HPSG can represent such chunks directly in the theory, but SBCG cannot, since linguistic signs do not have daughters. So here, Constructional HPSG and TAG are the theories that can represent complex chunks of linguistic material with its internal structure, while other theories like GB, Minimalism, CG, LFG, SBCG, and DG cannot.

# **1.2.5 Summary**

If all these points are taken together, it is clear that most variants of MGG are not Construction Grammars. However, CxG had considerable influence on other frameworks so that there are constructionist variants of LFG, HPSG, and TAG. HPSG in the version of Sag (1997) (also called Constructional HPSG) and the HPSG dialect Sign-Based Construction Grammar are Construction Grammars that follow all the tenets mentioned above.

# **1.3 Variants of Construction Grammar**

The previous section discussed the tenets of CxG and to what degree other frameworks adhere to them. This section deals with frameworks that have Construc-

<sup>12</sup>http://www.welt.de/wirtschaft/article116297208/Die-verlogene-Kritik-an-den-Steuerparadiesen.html, commentary section, 2018-02-20.

### Stefan Müller

tion Grammar explicitly in their name or are usually grouped among Construction Grammars:


Berkeley Construction Grammar, Embodied Construction Grammar, Fluid Construction Grammar, and Sign-Based Construction Grammar are the ones that are more formal. All of these variants use feature value pairs and are constraintbased. They are sometimes also referred to as unification-based approaches. Berkeley Construction Grammar never had a consistent formalization. The variant of unification assumed by Kay & Fillmore (1999) was formally inconsistent (Müller 2006a: Section 2.4) and the computation of construction-like objects (CLOs) suggested by Kay (2002) did not work either (Müller 2006a: Section 3). Berkeley Construction Grammar was dropped by the authors, who joined forces with Ivan Sag and Laura Michaelis and eventually came up with an HPSG variant named Sign-Based Construction Grammar (Sag 2012). The differences between Constructional HPSG (Sag 1997) and SBCG are to some extent cosmetic: semantic relations got the suffix *-fr* for *frame* (*like-rel* became *like-fr*), phrases were called constructions (*hd-subj-ph* became *subj-head-cxt*), and lexical rules were called *derivational constructions*. <sup>13</sup> While this renaming would not have changed anything in terms of expressiveness of theories, there was another change that was not motivated by any of the tenets of Construction Grammar but rather by the wish to get a more restrictive theory: Sag, Wasow & Bender (2003) and Sag (2007) changed the feature geometry of phrasal signs in such a way that signs do not contain daughters. The information about mother-daughter relations is contained in lexical rules and phrasal schemata (constructions) only. The phrasal schemata are more like GPSG immediate dominance schemata (phrase structure

<sup>13</sup>This renaming trick was so successful that it even confused some of the co-editors of the volume about SBCG (Boas & Sag 2012). See for example Boas (2014) and the reply in Müller & Wechsler (2014b).

#### 32 HPSG and Construction Grammar

rules without constraints on the order of the daughters) in licensing a mother node when certain daughters are present, but without the daughters being represented as part of the AVM that stands for the mother node, as was common in HPSG from 1985 till Sag, Wasow & Bender (2003). <sup>14</sup> This differs quite dramatically from what was done in Berkeley Construction Grammar, since BCxG explicitly favored a non-local approach (Fillmore 1988: 37; Fillmore et al. 1988: 501). Arguments were not canceled but passed up to the mother node. Adjuncts were passed up as well, so that the complete internal structure of an expression is available at the top-most node (Kay & Fillmore 1999: 9). The advantage of BCxG and Constructional HPSG (Sag 1997) is that complex expressions (e.g., idioms and other more transparent expressions with high frequency) can be stored as chunks containing the internal structure. This is not possible with SBCG, since phrasal signs never contain internal structures. For a detailed discussion of Sign-Based Construction Grammar see Section 1.3.2 and Müller (2016: Section 10.6.2).

Embodied Construction Grammar (Bergen & Chang 2005) uses typed feature descriptions for the description of linguistic objects and allows for discontinuous constituents. As argued by Müller (2016: Section 10.6.3), it is a notational variant of Reape-style HPSG (Reape 1994; see also Müller 2021a: Section 6, Chapter 10 of this volume for discontinuous constituents in HPSG).

Fluid Construction Grammar is also rather similar to HPSG. An important difference is that FCG attaches weights to constraints, something that is usually not done in HPSG. But in principle, there is nothing that forbids adding weights to HPSG as well, and in fact it has been done (Brew 1995; Briscoe & Copestake 1999; Miyao & Tsujii 2008), and it should be done to a larger extent (Miller 2013). Van Trijp (2013) tried to show that Fluid Construction Grammar is fundamentally different from SBCG, but I think he failed in every single respect. See Müller (2017) for a detailed discussion, which cannot be repeated here for space reasons.

In what follows I will compare Constructional HPSG (as assumed in this volume) with SBCG.

### **1.3.1 Constructional HPSG**

As is discussed in other chapters in more detail (Richter 2021: Section 2; Abeillé & Borsley 2021: Section 3), HPSG uses feature value pairs to model linguistic objects. One important tool is structure sharing. For example, determiner, adjective, and noun agree with respect to certain features in languages like German. The identity of properties is modeled by identity of feature values and this identity is established by identifying the values in descriptions. Now, it is obvious that certain features are always shared simultaneously. In order to facilitate the state-

<sup>14</sup>The two approaches will be discussed in more detail in Section 1.3.1 and Section 1.3.2.

### Stefan Müller

ment of respective constraints, feature value pairs are put into groups. This is why HPSG feature descriptions are very complex. Information about syntax and semantics is represented under SYNTAX-SEMANTICS (SYNSEM), information about syntax under CATEGORY (CAT), and information that is projected along the head path of a projection is represented under HEAD. All feature structures have to have a type. The type may be omitted in the description, but there has to be one in the model. Types are organized in hierarchies. They are written in italics. (7) shows an example lexical item for the word *ate*: 15,16

The information about part of speech and finiteness is bundled under HEAD. The selection of a subject is represented under SPR (sometimes the feature SUBJ is used for subjects) and the non-subject arguments are represented as part of a list under COMPS. The semantic indices 1 and 2 are linked to thematic roles in the semantic representation (for more on linking, see Davis, Koenig & Wechsler 2021, Chapter 9 of this volume).

Dominance structures can also be represented with feature value pairs. While Pollard & Sag (1987) and Pollard & Sag (1994) had a DAUGHTERS feature and then certain phrasal types constraining the daughters within the DAUGHTERS feature, Sag (1997) represented the daughters and constraints upon them at the top level

<sup>15</sup>The first '…' stands for the feature LOCAL, which is irrelevant in the present discussion. It plays a role in the treatment of nonlocal dependencies (Borsley & Crysmann 2021, Chapter 13 of this volume).

<sup>16</sup>To keep things simple, I omitted the feature ARG-ST here. ARG-ST stands for argument structure. The value of ARG-ST is a list containing all arguments, that is, the elements of SPR and COMPS are also contained in the ARG-ST. Linking constraints are formulated with respect to the argument structure list. See Davis, Koenig & Wechsler (2021), Chapter 9 of this volume for a discussion of linking. The way arguments are linked to the valence features SPR and COMPS is languageor language-class-specific. See Chapter 9 and also Müller (2021a: Section 4), Chapter 10 of this volume.

#### 32 HPSG and Construction Grammar

of the sign.<sup>17</sup> This move made it possible to have subtypes of the type *phrase*, e.g., *filler-head-phrase*, *specifier-head-phrase*, and *head-complement-phrase*. Generalizations over these types can now be captured within the type hierarchy together with other types for linguistic objects like lexical items and lexical rules (see Section 1.2.3). (8) shows an implicational constraint on the type *head-complementphrase*: 18

(8) Head-Complement Schema adapted from Sag (1997: 479): *head-complement-phrase* ⇒ SYNSEM|LOC|CAT|COMPS hi HEAD-DTR|SYNSEM|LOC|CAT|COMPS 1, …, n NON-HEAD-DTRS -SYNSEM 1 , …, - SYNSEM n 

 The constraint says that feature structures of type *head-complement-phrase* have to have a SYNSEM value with an empty COMPS list, a HEAD-DTR feature, and a listvalued NON-HEAD-DTRS feature. The list has to contain elements whose SYNSEM values are identical to respective elements of the COMPS list of the head daughter ( 1 , …, ).

Dominance schemata (corresponding to grammar rules in phrase structure grammars) refer to such phrasal types. (9) shows how the lexical item in (7) can be used in a head-complement configuration:

(9) Analysis of *ate a pizza* in Constructional HPSG:

<sup>17</sup>The top level is the outermost level. So in (7), PHONOLOGY and SYNTAX-SEMANTICS are at the top level.

<sup>18</sup>The schema in (8) licenses flat structures. See Müller (2021a: 379), Chapter 10 of this volume for binary branching structures.

### Stefan Müller

The description in the COMPS list of the head is identified with the SYNSEM value of the non-head daughter ( 3 ). The information about the missing specifier is represented at the mother node ( 2 ). Head information is also shared between head daughter and mother node. The respective structure sharings are enforced by principles: the Subcategorization Principle or, in more recent versions of HPSG, the Valence Principle makes sure that all valents of the head daughter that are not realized in a certain configuration are still present at the mother node. The Head Feature Principle ensures that the head information of a head daughter in headed structures is identical to the head information on the mother node, that is, HEAD features are shared.

This is a very brief sketch of Constructional HPSG and is by no means intended to be a full-blown introduction to HPSG, but it provides a description of properties that can be used to compare Constructional HPSG to Sign-Based Construction Grammar in the next subsection.

# **1.3.2 Sign-Based Construction Grammar**

Having discussed some aspects of Constructional HPSG, I now turn to SBCG. SBCG is an HPSG variant, so it shares most properties of HPSG but there are some interesting properties that are discussed in this section. Locality constraints are discussed in the next subsection, and changes in feature geometry in the subsections to follow. Subsection 1.3.2.7 discusses Frame Semantics.

# 1.3.2.1 Locality constraints

As mentioned in Section 1.2.4, SBCG assumes a strong version of locality: phrasal signs do not have daughters. This is due to the fact that phrasal schemata (= phrasal constructions) are defined as in (10):

### (10) Head-Complement Construction following Sag et al. (2003: 481): *head-comp-cx* ⇒ MOTHER|SYN|VAL|COMPS hi HEAD-DTR 0 *word* SYN|VAL|COMPS A DTRS 0 ⊕ A *nelist*

 Rather than specifying syntactic and semantic properties of the complete linguistic object at the top level (as earlier versions of HPSG did), these properties are specified as properties under MOTHER. Hence a construction licenses a sign (a

#### 32 HPSG and Construction Grammar

phrase or a complex word), but the sign does not include daughters. The daughters live at the level of the construction only. While earlier versions of HPSG licensed signs directly, SBCG needs a statement saying that all objects under MOTHER are objects licensed by the grammar (Sag, Wasow & Bender 2003: 478):<sup>19</sup>

	- 1. there is a construction in , and
	- 2. there is a feature structure that is an instantiation of , such that Φ is the value of the MOTHER feature of .

The idea behind this change in feature geometry is that heads cannot select for daughters of their valents and hence the formal setting is more restrictive and hence reducing computational complexity of the formalism (Ivan Sag, p.c. 2011). However, this restriction can be circumvented by just structure sharing an element of the daughters list with some value within MOTHER. The XARG feature making one argument available at the top level of a projection (Bender & Flickinger 1999) is such a feature. So, at the formal level, the MOTHER feature alone does not result in restrictions on complexity. One would have to forbid such structure sharings in addition, but then one could keep MOTHER out of the business and state the restriction for earlier variants of HPSG (Müller 2018b: Section 10.6.2.1.3).

Note that analyses like the one of the Big Mess Construction by Van Eynde (2018: 841), also discussed in Van Eynde (2021: 303), Chapter 8 of this volume, cannot be directly transferred to SBCG since in the analysis of Van Eynde, this construction specifies the phrasal type of its daughters, something that is excluded by design in SBCG: all MOTHER values of phrasal constructions are of type *phrase* and this type does not have any subtypes (Sag 2012: 98). Daughters in syntactic constructions are of type *word* or *phrase*. So, it is impossible to require a daughter to be of type *regular-nominal-phrase* as in the analysis of Van Eynde. In order to capture the Big Mess Construction in SBCG, one would have to specify the properties of the daughters with respect to their features rather than specifying the types of the daughters, that is, one has to explicitly provide the features that are characteristic for feature structures of type *regular-nominalphrase* in Van Eynde's analysis rather than just naming the type. See Kay & Sag (2012) and Kim & Sells (2011) for analyses of the Big Mess Construction in SBCG.

<sup>19</sup>A less formal version of this constraint is given as the Sign Principle by Sag (2012: 105): "Every sign must be listemically or constructionally licensed, where: a sign is listemically licensed only if it satisfies some listeme, and a sign is constructionally licensed if it is the mother of some well-formed construct."

### Stefan Müller

### 1.3.2.2 SPR and COMPS vs. VALENCE

Sag, Wasow & Bender (2003) differentiated between specifiers and complements, but this distinction was given up in later work on SBCG. Sag (2012) has just one valence list that includes both subjects and non-subjects. This is a return to the valence representations of Pollard & Sag (1987). An argument for this was never given, despite arguments for a separation of valence information by Borsley (1987). With one single valence feature, a VP would be an unsaturated projection and generalizations concerning phrases cannot be captured. For example, a generalization concerning extraposition (in German) is that maximal projections (that is projections with an empty COMPS list) can be extraposed (Müller 1999: Section 13.1.2). It is impossible to state this generalization in SBCG in a straightforward way (Müller 2018b: Section 10.6.2.3).

# 1.3.2.3 The Head Feature Principle

There have been some other developments as well. Sag (2012) got rid of the Head Feature Principle and stated identity of information explicitly within constructions. Structure sharing is not stated with boxed numbers but with capital letters instead. An exclamation mark can be used to specify information that is not shared (Sag 2012: 125). While the use of letters instead of numbers is just a presentational variant, the exclamation mark is a non-trivial extension. (12) provides an example: the constraints on the type *pred-hd-comp-cxt*:

(12) Predicational Head-Complement Construction following Sag (2012: 152): *pred-hd-comp-cxt* ⇒ MOTHER|SYN X ! - VAL Y *word* 


The X stands for all syntactic properties of the head daughter. These are identified with the value of SYN of the mother with the exception of the VAL value, which is specified to be a list with the element Y. It is interesting to note that the !-notation is not without problems: Sag (2012: 145) states that the version of SBCG that he presents is "purely monotonic (non-default)", but if the SYN value of the mother is not identical due to overwriting of VAL, it is unclear how the type of SYN can be constrained. ! can be understood as explicitly sharing all features that are not mentioned after the !. Note, though, that the type has to be shared as well. This is not trivial, since structure sharing cannot be applied here, since structure

#### 32 HPSG and Construction Grammar

 

sharing the type would also identify all features belonging to the respective value. So one would need a relation that singles out a type of a structure and identifies this type with the value of another structure. Note also that information from features behind the ! can make the type of the complete structure more specific. Does this affect the shared structure (e.g., HEAD-DTR|SYN in (12))? What if the type of the complete structure is incompatible with the features in this structure? What seems to be a harmless notational device in fact involves some non-trivial machinery in the background. Keeping the Head Feature Principle makes this additional machinery unnecessary.

### 1.3.2.4 Feature geometry and the FORM feature

The phrasal sign for *ate a pizza* in Constructional HPSG was given in (9). (13) is the Predicational Head Complement Construction with daughters and mother filled in.

$$\begin{array}{c|c} \text{pred-hd-com-cat} \\ \begin{bmatrix} \text{pred-hd-com-cat} \\ \text{MOTHER} \\ \text{MOTERS} \\ \text{SYN} \\ \text{SYN} \\ \text{SEM} \\ \end{bmatrix} & \begin{bmatrix} \text{CAT} \\ \text{VAL} \left\{\text{NP}\left[nom\right] \right\} \\ \text{FEM} \\ \text{FROM} \\ \text{FROM} \end{bmatrix} \end{array} \Bigg| \\ \begin{bmatrix} \text{word} \\ \text{FROM} \left\{\text{ate} \right\} \\ \text{SYN} \\ \text{SYN} \\ \text{FTM} \\ \text{VAL} \left\{\text{NP}\left[nom\right], \text{E}\left[\text{NP}\left[acc\right]\right] \right\} \\ \text{DTRS} \\ \text{FTM} \end{bmatrix} \end{array} \right| \Bigg| \begin{bmatrix} \text{val} \\ \text{DEF} \left[\text{NP}\left[nom\right], \text{E}\left[\text{NP}\left[acc\right]\right] \right] \\ \text{DFNS} \\ \text{FTM} \\ \text{FTM} \end{bmatrix} \end{array} \right| \tag{13}$$

 As was explained in the previous subsection, Constructional HPSG groups all selectable information under SYNSEM and then differentiates into CAT and CONT. SBCG goes back to Pollard & Sag (1987) and uses SYN and SEM. The idea behind SYNSEM was to exclude the selection of phonological information and daughters (Pollard & Sag 1994: 23). Since daughters are outside of the definition of *synsem*, they cannot be accessed from within valence lists. Now, SBCG pushes this idea one step further and also restricts the access to daughters in phrasal schemata (constructions in SBCG terminology): since signs do not have daughters, constructions may not refer to the daughters of their parts. But obviously signs need to have a form part, since signs are per definition form-meaning pairs. It

### Stefan Müller

follows that the form part of signs is selectable in SBCG. This will be discussed in more detail in the following subsection. Subsection 1.3.2.6 discusses the omission of the LOCAL feature.

### 1.3.2.5 Selection of PHON and FORM values

The feature geometry of Constructional HPSG has the PHON value outside of SYNSEM. Therefore verbs can select for syntactic and semantic properties of their arguments but not for their phonology. For example, they can require that an object has accusative case but not that it starts with a vowel. SBCG allows for the selection of phonological information (the feature is called FORM here) and one example of such a selection is the indefinite article in English, which has to be either *a* or *an* depending on whether the noun or nominal projection it is combined with starts with a vowel or not (Flickinger, Mail to the HPSG mailing list, 01.03.2016):

	- b. a house

The distinction can be modeled by assuming a selection feature for determiners.<sup>20</sup> An alternative would be, of course, to capture all phonological phenomena by formulating constraints on phonology at the phrasal level (see Bird & Klein 1994, Höhle 1999, and Walther 1999 for phonology in HPSG).

Note also that the treatment of raising in SBCG admits nonlocal selection of phonology values, since the analysis of raising in SBCG assumes that the element on the valence list of the embedded verb is identical to an element in the ARG-ST list of the matrix verb (Sag 2012: 159). Hence, both verbs in (15) can see the phonology of the subject:

(15) Kim can eat apples.

In principle, there could be languages in which the form of the downstairs verb depends on the presence of an initial consonant in the phonology of the subject. English allows for long chains of raising verbs and one could imagine languages in which all the verbs on the way are sensitive to the phonology of the subject. Such languages probably do not exist.

Now, is this a problem? Not really, but if one develops a general setup in a way to exclude everything that is not attested in the languages of the world (as

<sup>20</sup>In the 1994 version of HPSG there is mutual selection between the determiner and the noun. The noun selects the determiner via SPR and the determiner selects the noun via a feature called SPECIFIED (Pollard & Sag 1994: 45–54).

#### 32 HPSG and Construction Grammar

for instance the selection of arguments of arguments of arguments), then it is a problem that heads can see the phonology of elements that are far away.

There are two possible conclusions for practitioners of Sign-Based Construction Grammar: either the MOTHER feature could be given up, since one agrees that theories that do not make wrong predictions are sufficiently constrained and one does not have to explicitly state what cannot occur in languages, or one would have to address the problem with nonlocally selected phonology values and therefore assume a SYNSEM or LOCAL feature that bundles information that is relevant in raising and does not include the phonology. In the latter case, the feature geometry of SBCG would get more complicated. This additional complication is further evidence against MOTHER, adding to the argument I made about MOTHER in Subsection 1.3.2.1.

### 1.3.2.6 The LOCAL feature and information shared in nonlocal dependencies

Similarly, elements of the ARG-ST list contain information about FORM. In nonlocal dependencies, this information is shared in the GAP list (SLASH set or list in other versions of HPSG) and is available all the way to the filler (Sag 2012: Section 10). In other versions of HPSG, only LOCAL information is shared and elements in valence lists do not have a PHON feature. If the sign that is contained in the GAP list were identified with the filler, the information about phonological properties of the filler would be available at the extraction site and SBCG could be used to model languages in which the phonology of a filler is relevant for a head from which it is extracted. So for instance, *likes* could see the phonology of *bagels* in (16):

(16) Bagels, I think that Peter likes.

It would be possible to state constraints saying that the filler has to contain a vowel or two vowels or that it ends with a consonant. In addition, all elements on the extraction path (*that* and *think*) can see the phonology of the filler as well. While there are languages that mark the extraction path (Bouma et al. 2001: 4– 5; Borsley & Crysmann 2021: 550–551, Chapter 13 of this volume), I doubt that there are languages that have phonological effects over unbounded dependencies. This problem can be and has been solved by assuming that the filler is not shared with the information in the GAP list, but parts of the filler are shared with parts in the GAP list: Sag (2012: 166) assumes that SYN, SEM, and STORE information is identified individually. Originally, the feature geometry of HPSG was motivated by the wish to structure share information. Everything within LOCAL was shared between filler and extraction site. This kind of motivation is given up in SBCG.

### Stefan Müller

Note, also, that not sharing the complete filler with the gap means that the FORM value of the element in the ARG-ST list at the extraction site is not constrained. Without any constraints, the theory would be compatible with infinitely many models, since the FORM value could be anything. For example, the FORM value of an extracted adjective could be h *Donald Duck* i or h *Dunald Dock* i or any arbitrary chaotic sequence of letters/phonemes. To exclude this, one can stipulate the FORM values of extracted elements to be the empty list, but this leaves one with the unintuitive situation that the element in GAP has an empty FORM list while the corresponding filler has a different, filled one.

See also Borsley & Crysmann (2021: Section 10), Chapter 13 of this volume for a comparison of the treatment of unbounded dependencies in Constructional HPSG and SBCG.

### 1.3.2.7 Frame Semantics

Another difference between SBCG and other variants of HPSG is the use of Frame Semantics (Fillmore 1982; 1985a). The actual representations in SBCG are based on MRS (Minimal Recursion Semantics, Copestake et al. 2005, see also Koenig & Richter 2021, Chapter 22 of this volume) and the change seems rather cosmetic (relations have the suffix *-fr* for frame rather than *-rel* for relation and the feature is called FRAMES rather than RELATIONS), but there is one crucial difference: the labels of semantic roles are more specific than what is usually used in other variants of HPSG.<sup>21</sup> Sag (2012: 89) provides the following representation for the meaning contribution of the verb *eat*:

$$\begin{aligned} \text{(17)} \quad & \begin{bmatrix} \text{sem-obj} \\ \text{INDEX 8} \\ & l \\ \text{FRAME} \left\langle \begin{bmatrix} \text{eating-} \dot{f}r \\ \text{LABEL} & l \\ \text{SIT} & s \\ \text{INGESTOR} & i \\ \text{INGESTIBLE} & j \end{bmatrix} \right\rangle \end{bmatrix} \end{aligned} $$

While some generalizations over verbs of a certain type can be captured with role labels like INGESTOR and INGESTIBLE, this is limited to verbs of ingestion. More general role labels like AGENT and PATIENT (or PROTO-AGENT and PROTO-

<sup>21</sup>Pollard & Sag (1987: 95) and Pollard & Sag (1994) use role labels like KISSER and KISSEE that are predicate-specific. Generalizations over these feature names are impossible within the standard formal setting of HPSG (Pollard & Sag 1994: Section 8.5.3; Müller 1999: 24, Fn. 1; Davis 2001: Section 4.2.1).

32 HPSG and Construction Grammar

PATIENT, Dowty 1991, or ACTOR and UNDERGOER Van Valin 1999) allow for more generalizations of broader classes of verbs (see Davis & Koenig 2000, Davis 2001: Section 4.2.1, and Davis, Koenig & Wechsler 2021, Chapter 9 of this volume).

### **1.3.3 Summary**

This section enumerated various flavors of Construction Grammars and briefly discussed the more formal variants. It was noted that the formal underpinnings are rather similar in many cases. What is different, though, is the kind of approach taken towards the representation of valence and argument structure constructions. Constructional HPSG and SBCG differ from other Construction Grammars in taking a strongly lexicalist stance (Sag & Wasow 2011: Section 10.4.3; Wasow 2021: Section 3.4, Chapter 24 of this volume): argument structure is encoded lexically. A ditransitive verb is a ditransitive verb since it selects for three NP arguments. This selection is encoded in valence features of lexical items. It is not assumed that phrasal configurations can license additional arguments as it is in basically all other variants of Construction Grammar. The next section discusses phrasal CxG approaches in more detail. Section 4 then discusses patterns that should be analyzed phrasally and which are problematic for entirely headdriven (or rather functor-driven) theories like Categorial Grammar, Dependency Grammar, and Minimalism.

# **2 Valence vs. phrasal patterns**

Much work in Construction Grammar starts from the observation that children acquire patterns and, in later acquisition stages, abstract from these patterns to schemata containing open slots to be filled by variable material, for example subjects and objects (Tomasello 2003). The conclusion that is drawn from this is that language should be described with reference to phrasal patterns. Most Construction Grammar variants assume a phrasal approach to argument structure constructions (Goldberg 1995; 2006; Goldberg & Jackendoff 2004), with Constructional HPSG (Sag 1997), Boas's (2003) work, and SBCG (Sag, Boas & Kay 2012; Sag 2012) being the three exceptions. So, for examples like the resultative construction in (18), Goldberg (1995: Chapter 8) assumes that there is a phrasal construction [Subj [V Obj Obl]]<sup>22</sup> into which material is inserted and which contributes the resultative semantics as a whole.

<sup>22</sup>Goldberg does not state the resultative construction, but the Caused-Motion Construction, which is syntactically parallel to the Resultative Construction, is specified this way on p. 152. She describes the syntax of resultative constructions on p. 192.

### Stefan Müller

### (18) She fished the pond empty.

HPSG follows the lexical approach and assumes that *fish*- is inserted into a lexical construction (lexical rule), which licenses the combination with other parts of the resultative construction (Müller 2002: Section 5.2).

I argued in several publications that the language acquisition facts can be explained in lexical models as well (Müller 2010: Section 6.3; Müller & Wechsler 2014a: Section 9). While a pattern-based approach claims that (19) is analyzed by inserting *Kim*, *loves*, and *Sandy* into a phrasal schema stating that NP[nom] verb NP[acc] or subject verb object are possible sequences in English, a lexical approach would state that there is a verb *loves* selecting for an NP[nom] and an NP[acc] (or for a subject and an object).

### (19) Kim loves Sandy.

Since objects follow the verb in English (modulo extraction) and subjects precede the verb, the same sequence is licensed in the lexical approach. The lexical approach does not have any problems accounting for patterns in which the sequence of subject, verb, and object is discontinuous. For example, an adverb may intervene between subject and verb:

(20) Kim really loves Sandy.

In a lexical approach it is assumed that verb and object may form a unit (a VP). The adverb attaches to this VP and the resulting VP is combined with the subject. The phrasal approach has to assume either that adverbs are part of phrasal schemata licensing cases like (20) (see Uszkoreit 1987: Section 6.3.2 for such a proposal in a GPSG approach to German) or that the phrasal construction may license discontinuous patterns. Bergen & Chang (2005: 170) follow the latter approach and assume that subject and verb may be discontinuous but verb and object(s) have to be adjacent. While this accounts for adverbs like the one in (20), it does not solve the general problem, since there are other examples showing that verb and object(s) may appear discontinuously as well:

(21) Mary tossed me a juice and Peter a water.

Even though *tossed* and *Peter a water* are discontinuous in (21), they are an instance of the ditransitive construction. The conclusion is that what has to be acquired is not a phrasal pattern but rather the fact that there are dependencies between certain elements in phrases (see also Behrens 2009 for a similar view from a language acquisition perspective). I return to ditransitive constructions in Section 2.3.

#### 32 HPSG and Construction Grammar

I discussed several phrasal approaches to argument structure and showed where they fail (Müller 2006a,b; 2007; 2010; Müller & Wechsler 2014a,b; Müller 2018a). Of course, the discussion cannot be reproduced here, but I want to repeat four points showing that lexical valence representation is necessary and that effects that are the highlight of phrasal approaches can be achieved in lexical proposals as well. The first two are problems that were around in GPSG times and basically were solved by abandoning the framework and adopting a new framework which was a fusion of GPSG and Categorial Grammar: HPSG.<sup>23</sup>

# **2.1 Derivational morphology and valence**

The first argument (Müller 2016: Section 5.5.1) is that certain patterns in derivational morphology refer to valence. For example, the -*bar* 'able' derivation productively applies to transitive verbs only, that is, to verbs that govern an accusative.

	- b. \* helfbar helpable
	- c. \* schlafbar sleepable

Note that the -*bar* 'able' derivation is like the passive in that it suppresses the subject and promotes the accusative object: the accusative object is the element adjectives derived with the -*bar* 'able' derivation predicate over. There is no argument realized with the adjective *unterstützbaren* 'supportable' attaching to *Arbeitsprozessen* 'work.processes' in *unterstützbaren Arbeitsprozessen*. <sup>24</sup> Hence one could not claim that the stem enters a phrasal construction with arguments and -*bar* attaches to this phrase. It follows that information about valence has to be present in stems.

(i) der the [seiner his Frau wife treue] faithful Mann man 'the man who is faithful to his wife'

<sup>23</sup>For further criticism of GPSG see Jacobson (1987). A detailed discussion of reasons for abandoning GPSG can be found in Müller (2016: Section 5.5).

<sup>24</sup>Adjectives realize their arguments preverbally in German:

*unterstützbaren* 'supportable' does not take an argument; it is a complete adjectival projection like *seiner Frau treue*.

### Stefan Müller

Note also that the resultative construction interacts with the -*bar* 'able' derivation. (23) shows an example of the resultative construction in German in which the accusative object is introduced by the construction: it is the subject of *leer* 'empty' but not a semantic argument of the verb *fischt* 'fishes'.

(23) Sie she fischt fishes den the Teich pond leer. empty

So even though the accusative object is not a semantic argument of the verb, the -*bar* 'able' derivation is possible and an adjective like *leerfischbar* 'empty.fishable' meaning 'can be fished empty' can be derived. This is explained by lexical analyses of the -*bar* 'able' derivation and the resultative construction, since if one assumes that there is a lexical item for the verb *fisch*- selecting an accusative object and a result predicate, then this item may function as the input for the -*bar* 'able' derivation. See Section 3 for further discussion of -*bar* 'able' derivation and Verspoor (1997), Wechsler (1997), Wechsler & Noh (2001), and Müller (2002: Chapter 5) for lexical analyses of the resultative construction in the framework of HPSG.

# **2.2 Partial verb phrase fronting**

The second argument concerns partial verb phrase fronting (Müller 2016: Section 5.5.2). (24) gives some examples: in (24a) the bare verb is fronted and its arguments are realized to the right of the finite verb in the so-called middle field, in (24b) one of the objects is fronted together with the verb, and in (24c) both objects are fronted with the verb.

	- b. Ein a Märchen fairy.tale erzählen tell wird will er he seiner his Tochter daughter können. can
	- c. Seiner his Tochter daughter ein a Märchen fairy.tale erzählen tell wird will er he können. can 'He will be able to tell his daughter a fairy tale.'

The problem with sentences such as those in (24) is that the valence requirements of the verb *erzählen* 'to tell' are realized in various positions in the sentence. For fronted constituents, one requires a rule which allows a ditransitive to be realized without its arguments or with one or two objects. This basically destroys the idea of a fixed phrasal configuration for the ditransitive construction and points again in the direction of dependencies.

#### 32 HPSG and Construction Grammar

Furthermore, it has to be ensured that the arguments that are missing in the prefield are realized in the remainder of the clause. It is not legitimate to omit obligatory arguments or realize arguments with other properties like a different case, as the examples in (25) show:

	- b. \* Verschlungen devoured hat has er he.NOM nicht. not
	- c. \* Verschlungen devoured hat has er he.NOM ihm him.DAT nicht. not

The obvious generalization is that the fronted and unfronted arguments must add up to the total set of arguments selected by the verb. This is scarcely possible with the rule-based representation of valence in GPSG (Nerbonne 1986; Johnson 1986). In theories such as Categorial Grammar, it is possible to formulate elegant analyses of (25) (Geach 1970). Nerbonne (1986) and Johnson (1986) both suggest analyses for sentences such as (25) in the framework of GPSG which ultimately amount to changing the representation of valence information in the direction of Categorial Grammar. With a switch to CG-like valence representations in HPSG, the phenomenon of partial verb phrase fronting found elegant solutions (Höhle 2019: Section 4; Müller 1996; Meurers 1999).

# **2.3 Coercion**

An important observation in constructionist work is that, in certain cases, verbs can be used in constructions that differ from the constructions they are normally used in. For example, verbs that are usually used with one or two arguments may be used in the ditransitive construction:

	- b. She smiled herself an upgrade.<sup>25</sup>
	- c. He baked a cake.
	- d. He baked her a cake.

The usual explanation for sentences like (26b) and (26d) is that there is a phrasal pattern with three arguments into which intransitive and strictly transitive verbs

<sup>25</sup>Douglas Adams. 1979. *The Hitchhiker's Guide to the Galaxy*, Harmony Books. Quoted from Goldberg (2003: 220).

### Stefan Müller

may enter. It is assumed that the phrasal patterns are associated with a certain meaning (Goldberg 1996; Goldberg & Jackendoff 2004). For example, the benefactive meaning of (26d) is contributed by the phrasal pattern (Goldberg 1996: Section 6; Asudeh, Giorgolo & Toivonen 2014: 81).

The insight that a verb is used in the ditransitive pattern and thereby contributes a certain meaning is of course also captured in lexical approaches. Briscoe & Copestake (1999: Section 5) suggested a lexical rule-based analysis mapping a transitive version of verbs like *bake* onto a ditransitive one and adding the benefactive semantics. This is parallel to the phrasal approach in that it says: three-place *bake* behaves like other three-place verbs (e.g., *give*) in taking three arguments and by doing so, it comes with a certain meaning (see Müller 2018a for a lexical rule-based analysis of the benefactive constructions that works for both English and German, despite the surface differences of the respective languages). The lexical rule is a form-meaning pair and hence a construction. As Croft put it 18 years ago: lexical rule vs. phrasal schema is a false dichotomy (Croft 2003). But see Müller (2018a; 2006a; 2013) and Müller & Wechsler (2014a) for differences between the approaches.

Briscoe & Copestake (1999) paired their lexical rules with probabilities to be able to explain differences in productivity. This corresponds to the association strength that van Trijp (2011: 141) used in Fluid Construction Grammar to relate lexical items to phrasal constructions of various kinds.

# **2.4 Non-predictability of valence**

The last subsection discussed phrasal models of coercion that assume that verbs can be inserted into constructions that are compatible with the semantic contribution of the verb. Müller & Wechsler (2014a: Section 7.4) pointed out that this is not sufficiently constrained. Müller & Wechsler discussed the examples in (27), among others:

	- b. John trusts (\*on) Mary.

While *depends* can be combined with a *on*-PP, this is impossible for *trusts*. Also the form of the preposition of prepositional objects is not always predictable from semantic properties of the verb. So there has to be a way to state that certain verbs go together with certain kinds of arguments and others do not. A lexical specification of valence information is the most direct way to do this. Phrasal approaches sometimes assume other means to establish connections between lexical items and phrasal constructions. For instance, Goldberg (1995: 50)

#### 32 HPSG and Construction Grammar

assumes that verbs are "conventionally associated with constructions". The more technical work in Fluid CxG assumes that every lexical item is connected to various phrasal constructions via coapplication links (van Trijp 2011: 141). This is very similar to Lexicalized Tree Adjoining Grammar (LTAG; Schabes, Abeillé & Joshi 1988), where a rich syntactic structure is associated to a lexical anchor. So, phrasal approaches that link syntactic structure to lexical items are actually lexical approaches as well. As in GPSG, they include means to ensure that lexical items enter into correct constructions. In GPSG, this was taken care of by a number. I already discussed the GPSG shortcomings in previous subsections.

Concluding this section, it can be said that there has to be a connection between lexical items and their arguments and that a lexical representation of argument structure is the best way to establish such a relation.

# **3 Construction Morphology**

The first publications in Construction Morphology were the master's thesis of Riehemann (1993), which later appeared as Riehemann (1998), and Koenig's 1994 WCCFL paper and thesis (Koenig & Jurafsky 1995; Koenig 1994; 1999). Riehemann called her framework *Type-Based Derivational Morphology*, since it was written before influential work like Goldberg (1995) appeared and before the term *Construction Morphology* (Booij 2005) was used. Riehemann did a careful corpus study on adjective derivations with the suffix -*bar* '-able'. She noticed that there is a productive pattern that can be analyzed by a lexical rule relating a verbal stem to the adjective suffixed with -*bar*. <sup>26</sup> The productive pattern applies to verbs governing an accusative as in (28a) but is incompatible with verbs taking a dative as in (28b):

	- b. \* helfbar helpable
	- c. \* schlafbar sleepable

Intransitive verbs are also excluded, as (28c) shows. Riehemann suggests a schema like the one in (29):

<sup>26</sup>She did not call her rule a lexical rule, but the difference between her template and the formalization of lexical rules by Müller (2002: 26) is the naming of the feature MORPH-B vs. LEX-DTR. Copestake & Briscoe (1992: Section 8.2.3), Briscoe & Copestake (1999: Section 2), and Meurers (2001: 176) use a representation with IN and OUT features that actually corresponds to the MOTHER/DTRS format of SBCG. See Section 1.3.2.1.

### Stefan Müller

(29) Schema for productive adjective derivations with the suffix -*bar* in German adapted from Riehemann (1998: 68):

MORPH-B is a list that contains a description of a transitive verb (something that governs an accusative object which is linked to the undergoer role ( 2 ) and has an actor).<sup>27</sup> The phonology of this element ( <sup>1</sup> ) is combined with the suffix *bar* and forms the phonology of the complete lexical item. The resulting object is of category *adj* and the index of the accusative object of the input verb ( 2 ) is identified with the one of the subject of the resulting adjective and with the

<sup>27</sup>Note that the specification of the type *trans-verb* in the list under MORPH-B is redundant, since it is stated that there has to be an accusative object and that there is an actor and an undergoer in the semantics. Depending on further properties of the grammar, the specification of the type is actually wrong: productively derived particle verbs may be input to the -*bar* 'able' derivation, and these are not a subtype of *trans-verb*, since the respective particle verb rule derives both transitive (*anlachen* 'laugh at somebody') and intransitive verbs (*loslachen* 'start to laugh') (Müller 2003b: 296). *Anlachen* does not have an undergoer in the semantic representation suggested by Stiebels (1996). See Müller (2003b: 308) for a version of the -*bar* 'able' derivation schema that is compatible with particle verb formations as input.

The original formulation of Riehemann shares the CONT value of the semantics of the accusative NP with the subject of the adjective and the value of the UNDERGOER feature. I adapted the rule here to just share the index, since values of ACTOR and UNDERGOER features are of type *index*. Jean-Pierre Koenig pointed out to me that sharing of the whole content of the accusative object and the subject of the adjective is necessary, since otherwise the CONT value of the accusative object would be unrestricted and – according to the formal basics of HPSG – could vary in infinitely many ways. Such an explicit sharing of semantics is not necessary in Müller's approach, since he distinguishes between structural and lexical case (Przepiórkowski 2021: Section 2, Chapter 7 of this volume) and this makes it possible to structure share the complete description of the accusative object with the subject of the adjective.

32 HPSG and Construction Grammar

value of the UNDERGOER feature in the semantic representations of the adjective. The semantics of the input verb ( 4 ) is embedded under a modal operator in the semantics of the adjective.

While the description of the -*bar* 'able' derivation given so far captures the situation quite well, there are niches and isolated items that are exceptions. According to Riehemann (1998: 5), this was the case for 7% of the adjectives she looked at in her corpus study. Examples are verbs ending in -*ig* like *entschuldigen* 'to excuse'. The -*ig* is dropped in the derivation:

(30) entschuldbar excuseable

Other cases are lexicalized forms like *essbar* 'safely edible', which have a special lexicalized meaning. Exceptions of the accusative requirement are verbs selecting a dative (31a), a prepositional object (31b), reflexive verbs (31c), and even intransitive, mono-valent verbs (31d):

	- b. verfügbar available
	- c. regenerierbar regenerable
	- d. brennbar inflammable

To capture generalizations about productive, semi-productive and fixed patterns/items, Riehemann suggests a type hierarchy, parts of which are provided in Figure 2. The type *bar-adj* stands for all -*bar* adjectives and comes with the

Figure 2: Parts of the type hierarchy for -*bar* 'able' derivation adapted from Riehemann (1998: 15)

constraints that apply to all of them. One subtype of this general type is*trans-baradj*, which subsumes all adjectives that are derived from transitive verbs. This

### Stefan Müller

includes all regularly derived -*bar*-adjectives, which are of the type *reg-bar-adj* but also *essbar* 'edible' and *sichtbar* 'visible'.

As this recapitulation of Riehemann's proposal shows, the analysis is a typical CxG analysis: V-*bar* is a partially filled word (see Goldberg's examples in Table 32.1). The schema in (29) is a form-meaning pair. Exceptions and subregularities are represented in an inheritance network.

# **4 Phrasal patterns**

Section 2 discussed the claim that constructions in the sense of CxG have to be phrasal. I showed that this is not true and that in fact lexical approaches to valence have to be preferred under the assumptions usually made in nontransformational theories. However, there are other areas of grammar that give exclusively head-driven approaches like Categorial Grammar, Minimalism, and Dependency Grammar a hard time. In what follows I discuss the NPN construction and various forms of filler gap constructions.

# **4.1 The NPN Construction**

Matsuyama (2004) and Jackendoff (2008) discuss the NPN Construction, examples of which are provided in (32):

	- b. Day after day after day went by, but I never found the courage to talk to her. (Bargmann 2015)

The properties of the NPN construction (with *after*) are summarized by Bargmann (2015) in a concise way and I will repeat his examples and summarization below to motivate his analysis in (40).

The examples in (32) show that the N-after-N Construction has the *distribution of NPs*. As (33) shows, the construction is *partially lexically fixed*: *after* cannot be replaced by any other word (Matsuyama 2004: 73).

(33) Alex asked me question { after / \* following / \* succeeding } question.

The construction is *partially lexically flexible*: the choice of Ns is free, except for the fact that the Ns must be identical (34a), the Ns must be count nouns (34b), Ns must be in the singular (34c), and the Ns must be bare (34d).


32 HPSG and Construction Grammar


The construction is *syntactically fixed*: N-after-N cannot be split by syntactic operations as the contrast in (35) shows (Matsuyama 2004):

(35) a. Man after man passed by.

b. \* Man passed by after man.

If extraposition of the *after*-N constituent were possible, (35b) with an extraposed *after man* should be fine but it is not, so NPN seems to be a fixed configuration.

There is a syntax-semantics mismatch: while N-after-N is syntactically singular, as (36) shows, it is plural semantically, as (37) shows:


Furthermore there is an aspect of semantic sequentiality: N-after-N conveys a temporal or spatial sequence: as Bargmann (2015) states, the meaning of (38a) is something like (38b).

(38) a. Man after man passed by.

b. First one man passed by, then another(, then another(, then another(, then … ))).

The Ns in the construction do not refer to one individual each; rather, they contribute to a holistic meaning.

The NPN construction allows adjectives to be combined with the nouns, but this is restricted. N1 can only be preceded by an adjective if N2 is preceded by the same adjective:

	- b. \* bad day after awful day (N1 and N2 are preceded by different adjectives.)
	- c. \* bad day after day (Only N1 is preceded by an adjective.)
	- d. day after bad day (Only N2 is preceded by an adjective.)

Finally, *after* N may be iterated to emphasize the fact that there are several referents of N, as the example in (32b) shows.

### Stefan Müller

This empirical description is covered by the following phrasal construction, which is adapted from Bargmann (2015): 28

(40) NPN Construction as formalized by Bargmann (2015): PHON *… N …, after, … N …* SS|LOC|CAT HEAD *noun* COUNT − AGR *3rdsing* VAL " SPR hi COMPS hi# SR  *.*∃ *.*| <sup>|</sup> *<sup>&</sup>gt;* 1 & <sup>∀</sup> <sup>∈</sup> : <sup>0</sup>() & ∃ ⊆ <sup>2</sup> & () DTRS \* PHON *… N …* SS|L|C HEAD *noun* COUNT + AGR *3rdsing* VAL " SPR DET COMPS hi # SR *. . . .* <sup>0</sup>() *. . .* , © « PHON *after* … HEAD *prep* SR ∃ ⊆ 2 , PHON *… N …* SS|L|C HEAD *noun* COUNT + AGR *3rdsing* VAL " SPR DET COMPS hi # SR *. . . .* <sup>0</sup>() *. . .* ª ® ® ® ® ® ® ® ® ® ® ® ® ® ¬ + + 

There is a list of daughters consisting of a first daughter and an arbitrarily long list of *after* N pairs. The '+' means that there has to be at least one *after* N pair. The nominal daughters select for a determiner via SPR, so they can be either bare nouns or nouns modified by adjectives. The semantic representation, non-standardly represented as the value of SR, says that there have to be several objects in a set X (∃ *.*| | *>* 1) and for all of them, the meaning of the N has to hold (∀ ∈ : 0 ()). Furthermore there is an order between the elements of X as stated by ∃ ⊆ 2 .

From looking at this construction, it is clear that it cannot be accounted for by standard X rules. Even without requiring X syntactic rules, there seems to be no way to capture these constructions in head-based approaches like Minimalism,

(ii) hearty [cheer after cheer]

<sup>28</sup>Jackendoff and Bargmann assume that the result of combining N, P, and N is an NP. However this is potentially problematic, as Matsuyama's example in (i) shows (Matsuyama 2004: 71):

<sup>(</sup>i) All ranks joined in hearty cheer after cheer for every member of the royal family …

As Matsuyama points out, the reading of such examples is like the reading of *old men and women* in which *old* scopes over both *men* and *women*. This is accounted for in structures like the one indicated in (ii):

Since adjectives attach to Ns and not to NPs, this means that NPN constructions should be Ns. Of course (ii) cannot be combined with a determiner, so one would have to assume that NPN constructions select for a determiner that has to be dropped obligatorily. Determiners are also dropped in noun phrases with mass nouns with a certain reading.

#### 32 HPSG and Construction Grammar

Categorial Grammar, or Dependency Grammar.<sup>29</sup> For simple NPN constructions, one could claim that *after* is the head. *After* would be categorized as a thirdperson singular mass noun and select for two Ns. It would (non-compositionally) contribute the semantics stated above. But it is unclear how the general schema with arbitrarily many repetitions of *after* N could be accounted for. If one assumes that *day after day* forms a constituent, then the first *after* in (41) would have to combine an N with an NPN sequence.

(41) day after [day [after day]]

This means that we would have to assume two different items for *after*: one for the combination of Ns and another one for the combination of N with NPN combinations. Note that an analysis of the type in (41) would have to project information about the Ns contained in the NPN construction, since this information has to be matched with the single N at the beginning. In any case, a lexical analysis would require several highly idiosyncratic lexical items (prepositions projecting nominal information and selecting items they usually do not select). It is clear that a reduplication account of the NPN construction as suggested by G. Müller (2011) does not work, since patterns with several repetitions of PN as in (41) cannot be accounted for as reduplication. G. Müller (p. 241) stated that reduplication works for word-size elements only (in German) and hence his account does not extend to the English examples given above. (42) shows an attested German example containing adjectives, which means that G. Müller's approach is not appropriate for German either.

(42) Die the beiden two tauchten surfaced nämlich namely geradewegs straightaway wieder again aus from dem the heimischen home Legoland Legoland auf, PART wo where sie they im in.the Wohnzimmer, living.room schwarzen black Stein brick um after schwarzen black Stein, brick vermeintliche alleged Schusswaffen firearms nachgebaut recreated hatten.<sup>30</sup> had 'The two surfaced straightaway from their home Legoland where they had recreated alleged firearms black brick after black brick.'

Travis (2003: 240) suggested a syntactic approach to the NPN construction. The

<sup>29</sup>Hudson (2021: 1476), Chapter 31 of this volume provides an analysis of the NPN Construction in the framework of Word Grammar. Since Word Grammar is a Dependency Grammar, this seems to falsify my claim, but it does not since Word Grammar is more powerful than usual Dependency Grammars. Hudson uses a network with some extra syntactic primitives that allow him to account for loops.

<sup>30</sup>Attested example from the newspaper taz, 05.09.2018, p. 20

### Stefan Müller

trees she provides are broken and contain symbols like Spec, so the details of the analysis are unclear, but she assumes that the preposition is of category Q and Q heads are special reduplication heads. An element from inside of the complement of Q is moved to SpecQP. The analysis begs several questions: why can incomplete constituents move to SpecQP? How is the external distribution of NPN constructions accounted for? Are they QPs? Where can QPs appear? Why do some NPN constructions behave like NPs? How is the meaning of this construction accounted for? If it is assigned to a special Q, the question is: how are examples like (32b) accounted for? Are two Q heads assumed? And if so, what is their semantic contribution?

This subsection showed how a special phrasal pattern can be analyzed within HPSG. The next section will discuss filler-gap constructions, which were analyzed as instances of a single schema by Pollard & Sag (1994: 164) but which were later reconsidered and analyzed as a family of subconstructions by Sag (1997; 2010).

# **4.2 Specialized sub-constructions**

HPSG took over the treatment of nonlocal dependencies from GPSG (Gazdar 1981; see also Flickinger, Pollard & Wasow 2021, Chapter 2 of this volume on the history of HPSG and Borsley & Crysmann 2021, Chapter 13 of this volume on unbounded dependencies). Pollard & Sag (1994: Chapters 4 and 5) had an analysis of topicalization constructions like (43) and an analysis of relative clauses. However, more careful examination revealed that more fine-grained distinctions have to be made. Sag (2010: 491) looked at the following examples:


As Sag shows, the fronted element is specific to the construction at hand:


#### 32 HPSG and Construction Grammar

A topicalized clause should not contain a*wh*-item (44a), a*wh*-interrogative should not contain a *what a* sequence appropriate for *wh*-exclamatives (44b), and so on.

Furthermore, some of these constructions allow non-finite clauses and others do not:


So there are differences as far as fillers and sentences from which something is extracted are concerned. Sag discussed further differences like inversion/noninversion in the clauses out of which something is extracted. I do not repeat the full discussion here but refer the reader to the original paper.

In principle, there are several ways to model the phenomena. One could assume empty heads as Pollard & Sag (1994: Chapter 5) suggested for the treatment of relative clauses. Or one could assume empty heads as they are assumed in Minimalism: certain so-called operators have features that have to be checked and cause items with the respective properties to move (Adger 2003: 330–331). Borsley (2006) discussed potential analyses of relative clauses involving empty heads and showed that one would need a large number of such empty heads, and since there is no theory of the lexicon in Minimalism, generalizations are missed (see also Borsley & Müller 2021: Section 4.1.5, Chapter 28 of this volume). The alternative suggested by Sag (2010) is to assume a general Filler-Head Schema of the kind assumed in Pollard & Sag (1994: 164) and then define more specific subconstructions. To take an example, the *wh*-exclamative is a filler-head structure, so it inherits everything from the more general construction, but in addition, it specifies that the filler daughter must contain a *what a* part and states the semantics that is contributed by the exclamative construction.

# **5 Summary**

This paper summarized the properties of Construction Grammar, or rather Construction Grammars, and showed that HPSG can be seen as a Construction Grammar, since it fulfills all the tenets assumed in CxG: it is surface-based, grammatical constraints pair form and function/meaning, the grammars do not rely

### Stefan Müller

on innate domain-specific knowledge, and the grammatical knowledge is represented in inheritance hierarchies. This sets HPSG and CxG apart from other generative theories that either assume innate language-specific knowledge (Minimalism, e.g., Chomsky 2013; Kayne 1994; Cinque & Rizzi 2010) or do not assume inheritance hierarchies for all linguistic levels (e.g., LFG).

I showed why lexical analyses of argument structure should be preferred over phrasal ones and that there are other areas in grammar where phrasal analyses are superior to lexical ones. I showed that they can be covered in HPSG, while they are problematic for proposals assuming that all structures have to have a head.

# **Acknowledgments**

I thank Anne Abeillé, Bob Borsley, Rui Chaves, and Jean-Pierre Koenig for comments on earlier versions of this chapter and for discussion in general. I thank Frank Richter for discussion of formal properties of SBCG. Thanks go to Frank Van Eynde for discussion of the Big Mess Construction in relation to SBCG.

# **References**


32 HPSG and Construction Grammar


### Stefan Müller


#### 32 HPSG and Construction Grammar


### Stefan Müller

(Oxford Handbooks in Linguistics), 51–65. Oxford: Oxford University Press. DOI: 10.1093/oxfordhb/9780199544004.013.0003.


32 HPSG and Construction Grammar

cally Oriented Theoretical Morphology and Syntax), 125–176. Berlin: Language Science Press. DOI: 10.5281/zenodo.5599824.


### Stefan Müller

(Artificial Intelligence), 254–263. Berlin: Springer Verlag. DOI: 10.1007/978-3- 662-04230-4.


#### 32 HPSG and Construction Grammar


### Stefan Müller

Frank Richter (eds.), *Beiträge zur deutschen Grammatik: Gesammelte Schriften von Tilman N. Höhle*, 2nd edn. (Classics in Linguistics 5), 417–433. Berlin: Language Science Press, 2019. DOI: 10.5281/zenodo.2588383.


32 HPSG and Construction Grammar


### Stefan Müller

*Structure Grammar: The handbook* (Empirically Oriented Theoretical Morphology and Syntax), 1001–1042. Berlin: Language Science Press. DOI: 10 . 5281 / zenodo.5599862.


32 HPSG and Construction Grammar


### Stefan Müller


#### 32 HPSG and Construction Grammar


### Stefan Müller


32 HPSG and Construction Grammar

CSLI Publications. http://csli-publications.stanford.edu/HPSG/2007/ (10 February, 2021).

Sag, Ivan A. 2010. English filler-gap constructions. *Language* 86(3). 486–545.


### Stefan Müller


32 HPSG and Construction Grammar


Aamot, Elias, 66 Abbott, Miriam, 181 Abeillé, Anne, 6, 9, 13, 17, 19, 22, 27, 29, 30, 61, 62, 135, 177, 181, 183, 191, 220, 224, 250, 256, 263, 278–280, 290, 304, 307, 316, 352, 357, 373–377, 379, 381, 386, 391, 421, 424, 426, 428, 430–434, 436, 438– 440, 442, 446–449, 468, 479, 497–499, 509, 516, 519, 524, 526, 540, 542, 546, 550, 553, 554, 563, 582, 611, 617, 622, 623, 625, 626, 631, 670, 673, 682, 689, 698, 730, 734, 736, 738, 739, 748, 750–753, 761, 763, 796, 813, 814, 820, 821, 823, 847, 848, 854, 858–860, 863–865, 870–876, 890, 894, 895, 898, 908, 926, 932, 1012, 1065, 1091, 1168, 1258, 1284, 1288, 1290, 1352, 1361, 1364, 1365, 1367, 1368, 1396, 1424, 1449, 1450, 1466, 1476, 1499, 1507, 1511, 1527 Abels, Klaus, 854 Abner, Natasha, 1201 Abney, Steven P., 277, 381, 1256 Abrusán, Márta, 666 Abzianidze, Lasha, 66 Ackerman, Farrell, 144, 148, 149, 156,

325, 326, 341 Ades, Anthony E., 1340, 1375 Adger, David, 191, 900, 1535 Adolphs, Peter, 1117 Aelbrecht, Lobke, 868 Aguila-Multner, Gabriel, 424, 430, 550, 747 Aissen, Judith, 351, 421, 431 Ajdukiewicz, Kazimierz, v, 3, 371, 377, 1001, 1337 Akmajian, Adrian, 678, 796 Alahverdzhieva, Katya, 1201, 1206, 1212, 1213, 1220, 1221, 1225, 1226 Alexopoulou, Theodora, 1064, 1065 Alibali, Martha W., 1231 Allegranza, Valerio, 278, 287, 295 Allwood, Jens, 673, 1230 Almeida, Diogo A. de A., 854 Almog, Lior, 795 Alotaibi, Ahmad, 516 Alotaibi, Mansour, 264, 566, 617 Alqurashi, Abdulrahman, 557, 558, 616, 619, 620, 646, 740 Alshaalan, Yara, 854 Altmann, Gerry, 676, 1085, 1087 Ambridge, Ben, 666, 1045 An, Aixiu, 748, 1258 Anderson, Carol, 744 Anderson, Stephen R., 14, 131, 133, 958, 961, 966, 968, 971

Andrews, Avery D., 193, 247, 421, 423, 439, 441, 963, 991, 1396, 1405 Antonov, Anton, 190 Aoun, Joseph, 1259, 1260, 1297 Arad Greshler, Tali, 66 Arai, Manabu, 676 Aranovich, Raúl, 127, 179, 347 Argyle, Michael, 1201, 1202 Ariel, Mira, 238 Arka, I. Wayan, 181, 323, 510–512, 920 Arnold, Doug, 30, 256, 278, 300, 307, 526, 552, 554, 569, 580, 634– 636, 640, 701, 746, 750, 893, 910, 912, 1360, 1361, 1475 Arnold, Jennifer E., 1085 Arnon, Inbal, 694, 1093 Aronoff, Mark, 778 Arregui, Ana, 858 Artawa, Ketut, 323 Artstein, Ron, 762 Asher, Nicholas, 1182 Assmann, Anke, 566, 619 Asudeh, Ash, 10, 53, 490, 1012, 1019, 1161, 1212, 1332, 1415, 1420, 1426–1428, 1432, 1433, 1506, 1526 Auer, Peter, 1233 Austin, John L., 1159, 1176 Austin, Peter, 1421 Avgustinova, Tania, 616, 954 Baayen, R. Harald, 1118 Bach, Emmon, 50, 54, 57, 58, 370, 890, 894, 900, 1001, 1356 Bachrach, Asaf, 674 Baerman, Matthew, 956 Baker, Curtis L., 813, 814, 818, 1017

Baker, Mark C., 322, 332, 1271 Baldridge, Jason, 141, 1108, 1120, 1332, 1342, 1343, 1352, 1354, 1356, 1358, 1375 Baldwin, Timothy, 777–779, 781, 1106, 1122 Ball, Douglas L., 179, 194, 197, 199, 200, 1283 Baltin, Mark, 677, 678 Banarescu, Laura, 1123 Bangalore, Srinivas, 1111, 1112 Bangerter, Adrian, 1217 Bar-Hillel, Yehoshua, 58, 1337, 1503 Bargmann, Sascha, 783, 794, 1290, 1530–1532 Barker, Chris, 1333, 1349, 1354, 1367 Barlow, Michael, 221 Barry, Guy, 1375 Barton, Ellen, 853 Bartsch, Renate, 926 Barwise, Jon, 13, 1002, 1025, 1160, 1172, 1176, 1427, 1431 Bauer, Angelika, 1233 Bausewein, Karin, 560, 646, 896, 1269 Bavelas, Janet B., 1207, 1210, 1228 Bayer, Samuel, 259, 753, 754, 1357 Beavers, John, 263, 341, 391, 670, 730, 733, 752, 754, 756, 757, 761, 871, 874, 876, 877, 1355, 1361–1363 Bech, Gunnar, 451 Beecher, Henry, 857 Beermann, Dorothee, 1122 Behaghel, Otto, 373 Behrens, Heike, 1522 Beißwenger, Michael, 1157 Bekki, Daisuke, 1336, 1355

Belletti, Adriana, 812, 833 Bender, Emily M., 7, 31, 64, 66, 68, 93, 122, 157, 177, 178, 189, 203, 264, 283, 381, 402, 495, 514, 516, 729–731, 733, 788, 790, 813, 818, 831, 832, 959, 1044, 1047, 1048, 1056–1058, 1087, 1111, 1116, 1120–1123, 1125, 1213, 1256, 1257, 1259, 1375, 1396, 1423, 1427, 1472, 1484, 1503, 1510, 1511, 1515, 1516 Beneš, Eduard, 399 Bergen, Benjamin K., 1510, 1511, 1522 Berman, Arlene, 299 Berman, Judith, 1396 Bernardi, Raffaella, 1358 Berwick, Robert C., 68, 672, 1257, 1258, 1459, 1505 Bever, Thomas G., 686, 1086, 1095 Bhatt, Rajesh, 236 Bianchi, Valentina, 1503 Bickel, Balthasar, 189, 981 Bierman, Arthur K., 1222 Bierwisch, Manfred, 370, 1256 Bîlbîie, Gabriela, 634, 730, 732, 734, 736, 737 Bildhauer, Felix, 399, 401, 1044, 1047, 1048, 1068–1073, 1258 Billroth, Johann, 1452 Bird, Steven, 9, 93, 1123, 1518 Birdwhistell, Ray L., 1201 Bishop, Dorothy V. M., 1504 Bjerre, Anne, 646 Bjerre, Tavs, 384 Blake, Barry, 252, 1458, 1482 Blakemore, Diane, 635 Blanco, Eduardo, 1126

Blank, Malou, 1156 Blevins, James P., 345, 390, 915, 954, 985, 986, 988, 1457 Bloomfield, Leonard, 50, 728 Blunsom, Phil, 1117, 1127 Boas, Franz, 1262 Boas, Hans C., 265, 610, 631, 1353, 1426, 1507, 1510, 1521 Bochner, Harry, 128, 165 Bod, Rens, 701, 1505 Boeckx, Cedric, 1297, 1302 Boguraev, Bran, 1106 Bolc, Leonard, 616 Bolhuis, Johan J., 1459 Bolinger, Dwight, 681, 694, 893 Bolt, Richard A., 1211 Bonami, Olivier, 33, 127, 130, 162, 163, 177, 290, 391, 424, 442, 469, 470, 472, 473, 475–477, 730, 736, 739, 830, 947, 959, 963, 964, 967, 969, 974, 975, 979, 980, 982–984, 986–988, 990, 991, 1019, 1161 Bond, Francis, 177, 1111, 1116, 1123, 1126 Booij, Geert, 131, 955, 990, 1527 Bopp, Franz, 227 Borer, Hagit, 141 Borgonovo, Claudia, 691 Börjars, Kersti, 1331, 1332 Borsley, Robert D., 6, 7, 15, 17, 24, 30, 33, 34, 61, 62, 64, 68, 135, 181, 184, 190, 191, 196, 197, 200, 220, 223, 224, 238, 256, 264, 278–280, 370–372, 374–376, 379, 381, 384–387, 389, 395, 456, 468, 479, 493, 497, 498, 509, 516, 524, 526, 540, 542,

544, 546, 547, 550, 553–555, 557, 558, 560–562, 566, 580, 582, 598, 607, 611, 616, 617, 619, 620, 622, 626, 635, 636, 638, 639, 645, 646, 666, 668, 728, 730, 734, 736, 738–740, 744, 749, 782, 787, 789, 796, 797, 813, 821, 826, 832, 834, 835, 838, 847, 865, 875, 893, 895, 898, 901, 902, 908, 922, 926, 928, 933, 1008, 1012, 1091, 1106, 1120, 1158, 1161, 1168, 1254, 1257, 1262, 1277, 1279, 1280, 1288, 1296–1298, 1331, 1332, 1340, 1366, 1406, 1407, 1420, 1470, 1475, 1476, 1483, 1505, 1511, 1512, 1516, 1519, 1520, 1534, 1535 Bošković, Željko, 1288 Bosque, Ignacio, 751 Boullier, Pierre, 1107 Bouma, Gosse, 17, 30, 65, 67, 153, 158, 160, 165, 255, 287, 321, 347, 352, 377–379, 384, 386, 424, 427, 450, 509, 526, 544, 548, 550, 581, 638, 668, 678, 699, 865, 893, 902, 908, 911, 922, 1109, 1118, 1119, 1123, 1283, 1420, 1482, 1484, 1502, 1503, 1519 Boyé, Gilles, 163, 964 Brachman, Ronald J., 142 Brame, Michael, 493, 904, 916, 1399 Branco, António, 65, 177, 908, 923, 927, 928, 930–932, 1161, 1373 Branigan, Holly, 1087, 1089–1092 Bratt, Elizabeth Owen, 252

Breckinridge Church, Ruth, 1231 Bredenkamp, Andrew, 893, 922 Bresnan, Joan, v, 3, 49, 128, 130, 186, 227, 277, 322, 357, 489, 490, 493, 496, 517, 527, 559, 581, 699, 760, 827, 893, 894, 948, 968, 1019, 1082, 1083, 1087, 1253, 1269, 1299, 1332, 1395, 1398, 1399, 1401, 1402, 1405, 1413–1416, 1418, 1420, 1421, 1458, 1503 Bressem, Jana, 1202 Brew, Chris, 1511 Briscoe, Ted, 10, 150, 155, 158, 1460, 1511, 1526, 1527 Broadwell, George Aaron, 132, 134, 987 Bruening, Benjamin, 126, 129–131, 236, 750, 1290, 1352 Bruland, Tore, 66 Bruneau, Thomas J., 1201 Bruner, Jerome S., 1216 Bühler, Karl, 1217 Burger, Harald, 778, 781 Büring, Daniel, 1060, 1369, 1374 Burks, Arthur W., 1222 Butt, Miriam, 245, 254, 421, 422, 1120, 1122, 1427 Butterworth, George, 1209 Buys, Jan, 1117, 1127 Bybee, Joan, 1500 Caballero, Gabriela, 955 Cairns, Helen Smith, 687 Calcagno, Mike, 1358 Calder, Jonathan, 1357 Callmeier, Ulrich, 63, 1116, 1117 Carletta, Jean, 1230 Carlson, Greg, 334

Carnie, Andrew, 1261 Carpenter, Bob, 65, 141, 161, 1170, 1344, 1346, 1360, 1428 Carreiras, Manuel, 680 Carroll, John, 1108, 1109, 1120 Carstairs, Andrew, 976 Carter, David, 1112 Cassell, Justin, 1231 Cattell, Ray, 690 Caute, Anna, 1233 Cha, Jong-Yul, 643 Chan, Chung, 642, 643 Chang, Nancy, 1510, 1511, 1522 Chatzikyriakidis, Stergios, 1357 Chaves, Rui P., 9, 27, 30, 33, 61, 70, 131, 220, 256, 263, 357, 379, 391, 430, 440, 493, 538, 543, 549, 551–553, 611, 666, 669– 671, 674, 677–679, 681, 683, 684, 688, 689, 691, 696, 697, 700, 728, 730, 736, 744, 752– 755, 757, 760–763, 796, 848, 855, 870–872, 874–876, 894, 926, 932, 1093, 1094, 1258, 1288, 1298, 1301, 1303, 1305, 1359, 1361, 1363, 1365, 1368, 1449, 1466, 1476, 1499 Chen, Danqi, 1458 Chen, Yufei, 1117, 1126, 1127 Chesi, Cristiano, 1299, 1300, 1503 Choi, Incheol, 424, 469 Chomsky, Noam, 4, 7, 48–50, 128, 184, 191, 220, 246, 275–277, 325, 387, 490, 494, 511, 515, 526, 537, 615, 646, 677, 681, 682, 684, 688, 690, 693, 728, 729, 782, 812, 815, 890– 893, 900, 924, 925, 1081,

1093, 1096, 1106, 1107, 1157, 1159, 1253–1257, 1262, 1263, 1265, 1266, 1268, 1271, 1279, 1286, 1290, 1292, 1301–1305, 1397, 1404, 1405, 1420, 1448, 1457–1459, 1497, 1498, 1501– 1505, 1536 Christiansen, Morten H., 1094 Chung, Chan, 424, 450, 461, 463– 468, 828, 910 Chung, Sandra, 675, 854, 856, 1366 Cienki, Alan, 1202, 1211 Cinque, Guglielmo, 384, 673, 690, 1282, 1504, 1536 Citko, Barbara, 1269, 1288 Clark, Herbert H., 687, 1157, 1164, 1166, 1208, 1217 Clark, Stephen, 1112, 1128, 1375 Clausen, David R., 687, 688 Clemens, Lauren Eby, 196 Clifton, Jr., Charles, 676, 679 Cohen, Philip R., 1163, 1213 Cohen, Shay B., 1258 Cole, Peter, 1259 Collins, Chris, 67, 349 Colston, Herbert L., 778–780 Comrie, Bernard, 58, 189, 190, 235, 322, 379, 895, 896, 1094, 1483 Cook, Mark, 1202 Cook, Philippa, 399, 1047, 1068–1070, 1258 Cook, Susan Wagner, 1231 Cooper, Robin, 93, 860, 862, 930, 1005, 1025, 1166, 1171, 1172, 1175, 1176, 1183, 1207, 1224, 1229, 1371, 1431 Cooperrider, Kensy, 1208, 1209

Copestake, Ann, 7, 10, 13, 19, 24, 26, 62, 63, 65, 66, 68, 130, 150–152, 155, 158, 330, 499, 611, 613, 638, 674, 696, 733, 788, 790, 927, 958, 959, 1002, 1019–1025, 1046, 1112–1114, 1117, 1128, 1171, 1221, 1225, 1264, 1355, 1426, 1428, 1460, 1462, 1508, 1511, 1520, 1526, 1527 Coppock, Elizabeth, 1414 Corbett, Greville G., 181, 228, 229, 746, 748, 956 Corver, Norbert, 780 Costa, Francisco, 65, 177, 1161 Coulson, Seanna, 1206 Crain, Stephen, 1087, 1375 Crawford, Jean, 688 Creary, Lewis G., 56 Creel, Sarah C., 676 Cresti, Diana, 696 Crocker, Matthew Walter, 1502 Croft, William, 1510, 1526 Crouch, Richard, 1012, 1019, 1428, 1433 Crowgey, Joshua, 178, 826, 831, 832 Crysmann, Berthold, 6, 7, 10, 11, 15, 17, 20, 24, 30, 34, 61, 66, 93, 127, 130, 132, 133, 158, 162–165, 177, 184, 256, 260, 263, 264, 320, 355, 370–372, 391, 395, 424, 430, 456, 473, 493, 545–547, 549, 561, 563– 566, 571, 574, 577–579, 598, 616, 617, 620, 622, 626, 636, 638, 639, 666, 668, 730, 736, 744, 747, 754, 756, 757, 787, 789, 797, 813, 826, 874–877,

893, 898, 901, 902, 908, 922, 928, 947, 957, 959, 963–965, 967, 969, 974, 975, 977–980, 982–984, 986–988, 990, 991, 1008, 1019, 1116, 1120, 1254, 1257, 1276, 1277, 1296–1298, 1340, 1355, 1361, 1406, 1420, 1468, 1512, 1519, 1520, 1534 Cubelli, Roberto, 1233 Culicover, Peter W., 68, 577, 673, 683, 689, 691, 699, 700, 737, 738, 760, 780, 848, 849, 855, 856, 860, 893, 894, 923, 926, 1001, 1005, 1045, 1255–1257, 1273, 1275, 1276, 1475 Culy, Christopher, 51, 642, 1397 Curran, James R., 1112, 1375 Curry, Haskell B., 1429 Cutler, Anne, 794 Dąbkowski, Maksymilian, 178 Dąbrowska, Ewa, 1510 Dahl, Östen, 811, 1464 Dalrymple, Mary, 53, 226, 262, 263, 907, 908, 1395, 1396, 1399, 1401, 1406, 1418, 1420, 1422, 1425–1427, 1431, 1432, 1503, 1506 Daniels, Michael W., 258, 260, 262, 263, 754, 1358 Danon, Gabi, 226 Davies, William D., 666, 686 Davis, Anthony R., 15, 16, 20, 21, 138, 145, 158, 181, 190, 193, 194, 220, 222, 316, 321, 327–333, 336, 337, 339, 340, 342, 344– 346, 349, 352, 380, 383, 479, 502, 506, 510, 540, 615, 784, 821, 890, 895, 896, 906, 915,

920, 949, 1002, 1003, 1012, 1013, 1121, 1168, 1254, 1284, 1293, 1352, 1396, 1415–1418, 1483, 1484, 1498, 1506, 1512, 1520, 1521 de Beer, Carola, 1232 De Beule, Joachim, 1510 de Groote, Philippe, 1356 de Kok, Daniël, 1119 De Kuthy, Kordula, 9, 13, 21, 93, 307, 424, 450, 457, 579, 603, 668, 1004, 1044, 1045, 1047–1050, 1052–1055, 1058, 1060–1063, 1066–1069, 1071, 1121, 1162, 1220, 1223, 1282 de Ruiter, Jan Peter, 1202, 1232 de Saussure, Ferdinand, 1158, 1426 de Swart, Henriëtte, 348, 813, 837, 838 De Vicenzi, Marica, 680 Deane, Paul, 674, 688, 690, 691, 1092, 1359 Delahunty, Gerald P., 686 Dellert, Johannes, 1116 DeLong, Katherine A., 676 Demberg, Vera, 1182 den Besten, Hans, 926 den Dikken, Marcel, 737 Dery, Jeruen E., 70, 683, 688, 689 Desmet, Timothy, 680 Desmets, Marianne, 955 Devlin, Keith, 1160 Dhongde, Ramesh Vaman, 599 Di Sciullo, Anna-Maria, 125 Diaz, Thomas, 976, 988–990 Dik, Simon, 1458 DiRusso, Alyssa A., 1231 Dixon, Robert M.W., 189, 193

Dobrovol'skij, Dmitrij, 797 Domaneschi, Filippo, 1201 Donati, Caterina, 646, 1268, 1269 Donnellan, Keith S., 1175 Donohue, Cathryn, 33, 178, 371, 394, 395, 402, 1423 Donohue, Mark, 189 Dorna, Michael, 65 Dörre, Jochen, 65 Dost, Ascander, 132, 133 Dowty, David, 49, 50, 54, 57, 58, 141, 327–329, 334, 335, 341, 390, 760, 890, 926, 1048, 1337, 1339, 1356, 1371, 1521 Drellishak, Scott, 66, 190, 264, 729– 731, 733, 1116 Dridan, Rebecca, 1112, 1118 Drummond, Alex, 684 Dryer, Matthew S., 195, 811 Dubinsky, Stanley, 666, 686 Dukes, Michael, 178 Duncan, Susan D., 1232 Duran-Eppler, Eva, 1458 Dutilh, MWF, 1118 Dyła, Stefan, 256 Eberhard, Kathleen M., 1086 Ebert, Cornelia, 1201, 1212, 1227 Eco, Umberto, 1222 Edlund, Jens, 1157 Egg, Markus, 1002, 1012 Eisele, Andreas, 1116 Eisenberg, Peter, 382, 926 Ejerhed, Eva, 1093 Ekman, Paul, 1201, 1206, 1208 Elbourne, Paul, 683, 1265, 1296 Embick, David, 130 Emele, Martin C., 65

Emerson, Guy, 7, 66, 93, 122, 157, 158, 203, 790, 958, 959, 1087, 1128, 1213, 1256, 1257, 1375, 1484 Emonds, Joseph E., 50 Enfield, Nick J., 1209 Engdahl, Elisabet, 673, 690, 1044, 1046, 1047, 1049, 1051, 1052, 1058, 1063, 1064, 1071, 1093, 1161, 1162, 1220, 1223, 1282 Engelkamp, Judith, 380, 1283 Enger, Hans-Olav, 233, 234 Eppler, Eva Duran, 1450, 1458 Epstein, Samuel David, 68, 1258 Erbach, Gregor, 65, 380, 785, 787, 789, 798, 1283 Erdmann, Oskar, 915 Erjavec, Tomaž, 959, 972, 991 Erk, Katrin, 1128 Ernst, Thomas, 787, 813, 814 Eroms, Hans-Werner, 1450 Erteschik-Shir, Nomi, 673, 674, 688, 690, 1093 Eshghi, Arash, 1182 Espinal, M. Teresa, 795 Evans, Gareth, 928 Everaert, Martin, 793, 1501 Faarlund, Jan Terje, 233, 234 Faghiri, Pegah, 798 Falk, Yehuda N., 1259, 1402 Faltz, Leonard M., 1010, 1432 Fan, Zhenzhen, 66 Fang, Yimai, 1126 Fanselow, Gisbert, 399, 905, 924, 925, 1295 Farkas, Donka F., 348, 512, 513 Farkas, Richárd, 1125 Fassi-Fehri, Abdelkader, 1402

Fast, Jakub, 733 Federmeier, Kara D., 676 Fedorenko, Evelina, 694, 1258 Feldhaus, Anke, 386 Fellbaum, Christiane, 783 Fenstad, Jens Erik, 1427 Ferdinand de Saussure, 1158<sup>1</sup> Fernández, Eva M., 679, 680 Fernández, Raquel, 860, 1157, 1158, 1182 Fernando, Tim, 1166 Ferreira, Fernanda, 681, 686, 687 Feys, Robert, 1429 Fillmore, Charles J., 68, 69, 327, 328, 333, 334, 386, 778, 779, 781, 784, 792, 796, 1016, 1224, 1353, 1415, 1475, 1498–1502, 1506, 1510, 1511, 1520 Findlay, Jamie Y., 794, 1415 Firth, John Rupert, 1128 Fitch, W. Tecumseh, 1262, 1302, 1501, 1504, 1505 Fleischer, Wolfgang, 778 Flickinger, Dan, 3, 4, 12, 13, 18, 21, 34, 56, 57, 62, 63, 65, 66, 68, 141, 142, 162, 177, 371, 514, 582, 596, 701, 790, 927, 948, 1002, 1019, 1084, 1105–1107, 1110, 1112, 1116, 1118, 1124, 1125, 1127, 1129, 1171, 1253, 1264, 1280, 1284, 1303, 1352, 1356, 1427, 1450, 1498, 1501, 1506– 1508, 1515, 1534 Fodor, Janet Dean, 6, 666–668, 672, 679–681, 1092, 1302, 1396 Fodor, Jerry A., 685–687, 1081 Fokkens, Antske, 66, 179, 1117, 1130 Foley, William, 327

Fong, Sandiway, 1257 Ford, Marilyn, 1087 Fordham, Andrew, 1502 Fortin, Catherine, 854 Fox, Danny, 666, 674 Frampton, John, 191 Francis, Elaine J., 646, 1083 Frank, Anette, 386, 927, 1019, 1428 Franz, Alex, 65 Fraser, Bruce, 779, 780 Frazier, Lyn, 679, 686 Frege, Gottlob, 1172 Freidin, Robert, 684, 898 Freudenthal, Daniel, 1505 Frey, Werner, 905, 1265, 1294, 1502 Fricke, Ellen, 1202, 1217, 1218 Fried, Mirjam, 1510 Friederici, Angela D., 1459 Friedman, Joyce, 1120 Fries, Norbert, 896 Friesen, Wallace V., 1201, 1206, 1208 Fritz-Huechante, Paola, 250 Fujii, Mamoru, 51 Fukuda, Shin, 682 Fuß, Eric, 238 Galantucci, Bruno, 1228 Gallin, Daniel, 1034, 1048 Garnsey, Susan, 687 Garrett, M. F., 686 Garrod, Simon, 1228 Gawron, Jean Mark, 53, 331, 671 Gazdar, Gerald, v, 3, 48–51, 54, 55, 58, 276, 371, 384, 385, 399, 522, 523, 540, 568, 669, 678–680, 726, 728, 782, 783, 785, 787, 799, 1095, 1108, 1253, 1257, 1397, 1418, 1427, 1458, 1503, 1534

Geach, Peter Thomas, 422, 1340, 1354, 1525 Gennari, Silvia P., 1094 Georgi, Ryan Alden, 1122 Gerbl, Niko, 647, 1284 Gerdts, Donna, 333, 513 Gerwing, Jennifer, 1210 Ghayoomi, Masood, 177, 222, 1115 Ghodke, Sumukh, 1123 Gibbon, Dafydd, 1229 Gibbs, Jr., Raymond W., 778–780 Gibson, Edward, 675, 676, 679, 685, 694, 1094, 1095, 1258 Ginsburg, Jason, 1257 Ginzburg, Jonathan, v, 4, 7–9, 17, 18, 23, 25, 29, 30, 32, 62, 93, 138, 182, 183, 198, 277, 278, 281–283, 296, 297, 300, 302, 304, 307, 376, 377, 386, 390, 427, 441, 466, 544, 547–550, 560, 567–570, 600, 604, 606, 613, 614, 617, 635, 668, 675, 693–695, 699, 700, 849, 851, 854, 856, 857, 859–865, 868, 869, 871, 930, 1002, 1013, 1014, 1016, 1017, 1157, 1158, 1165–1168, 1170–1172, 1175– 1177, 1179, 1181, 1182, 1185, 1206, 1207, 1209, 1224, 1229, 1233, 1255, 1256, 1277, 1278, 1282, 1353, 1366, 1367, 1427 Giorgolo, Gianluca, 1212, 1415, 1506, 1526 Girard, Jean-Yves, 1428 Gisborne, Nikolas, 1450, 1458 Givón, Talmy, 227, 238 Glucksberg, Sam, 779, 781 Godard, Danièle, 14, 17, 29, 30,

177, 183, 256, 278, 290, 352, 353, 373, 377, 379, 386, 391, 421, 424, 428, 430–434, 436, 438–440, 442, 446– 449, 499, 519, 526, 552, 554, 563, 569, 603, 617, 622, 623, 625, 626, 666, 750, 752, 753, 812–814, 820, 823, 893, 910, 912, 1109, 1284, 1352, 1360, 1361, 1450 Goldberg, Adele E., 14, 126, 141, 159, 165, 330, 336, 344, 666, 674, 688, 690, 691, 1045, 1303, 1354, 1499–1501, 1510, 1521, 1525–1527 Golde, Karin E., 904 Goldin-Meadow, Susan, 1231 Goldsmith, John, 670, 728, 744 Gollan, Tamar H., 1082 Goodall, Grant, 256, 682, 688, 744 Goodluck, Helen, 673 Goodman, Michael Wayne, 959, 1127 Goodman, Nelson, 1222 Goodwin, Charles, 1155, 1208, 1219 Gotham, Matthew, 1428, 1434 Götz, Thilo, 65 Grano, Thomas, 743 Grant, Margaret, 858 Green, Georgia M., 1163, 1164, 1301 Greenberg, Joseph H., 227, 228, 322 Greshler, Tali Arad, 502 Grewendorf, Günther, 253, 912, 915, 1293, 1503 Griffiths, James, 855, 856 Grimshaw, Jane, 191, 559, 1269, 1402 Grinevald, Colette, 228 Grishman, Ralph, 1088 Groenendijk, Jeroen, 1014

Groenendijk, Marius, 65 Grohmann, Kleantes K., 1261 Groos, Anneke, 257, 559, 1269 Gross, Maurice, 779 Groß, Thomas M., 1450 Grosu, Alexander, 559, 637, 667, 679, 680, 691, 760, 1092 Grover, Claire, 252, 550 Guéron, Jacqueline, 684 Guilfoyle, Eithne, 325 Güngördü, Zelal, 616 Gunji, Takao, 380, 422, 627, 628, 632, 1295, 1352 Gussenhoven, Carlos, 1054, 1280 Gutmann, Sam, 191 Guzmán Naranjo, Matìas, 986 Hackl, Martin, 666 Haddar, Kais, 620 Hadjikhani, Nouchine, 1209 Haegeman, Liliane, 238, 812, 1304 Haftka, Brigitta, 384 Hagstrom, Paul, 827 Hahn, Florian, 1222 Hahn, Michael, 620, 646 Haider, Hubert, 246, 253, 384, 575, 576, 1066, 1260, 1502 Haji-Abdolhosseini, Mohammad, 1162 Hale, Kenneth, 357, 422, 1274, 1275 Hall, Edward T., 1202 Halle, Morris, 128, 130 Halliday, Michael A. K., 1161, 1280, 1457 Halvorsen, Per-Kristian, 1426, 1427 Hamblin, Charles Leonard, 1014, 1166 Hammarström, Harald, 178 Han, Chung-hye, 827

Hankamer, Jorge, 857, 859, 871 Harbert, Wayne, 898 Hardt, Daniel, 854, 858 Harley, Heidi, 1275 Harman, Gilbert H., 1458 Harris, Alice C., 132, 133, 547, 955, 963, 967 Harris, Randy Allen, 49 Harris, Zellig Sabbetai, 1128 Harwood, William, 868 Hashimoto, Chikara, 1107, 1117 Haspelmath, Martin, 129–131, 180, 184, 186, 192, 423, 729, 797 Hassamal, Shrita, 30, 507, 854, 863, 864 Haug, Dag Trygve Truslew, 1428 Haugereid, Petter, 65, 177 Hauser, Marc D., 1262, 1302, 1501, 1504, 1505 Hawkins, John A., 672, 854, 1082, 1094, 1096 Head, Brian F., 235 Healey, Patrick G. T., 1182 Hegarty, Michael, 690 Heim, Irene, 1001, 1161, 1431, 1501 Heinz, Wolfgang, 191, 246, 249–251 Heldner, Mattias, 1157 Hellan, Lars, 65, 66, 177, 908, 923, 1122, 1484 Hemforth, Barbara, 761, 858 Henderson, John M., 681, 686 Hendriks, Petra, 1364, 1366 Henri, Fabiola, 506–509, 513, 516, 521, 545, 546 Herbelot, Aurélie, 66 Herring, Joshua, 1106 Hershcovich, Daniel, 1114 Herzig Sheinfux, Livnat, 66

Higginbotham, James, 402 Higgins, Francis Roger, 782 Hillyard, Steven A., 676 Himmelmann, Nikolaus P., 189 Hinrichs, Erhard W., 352, 380, 390, 420, 424, 442, 443, 450, 452, 453, 455, 457, 526, 603, 619, 646, 1296 Hiramatsu, Kazuko, 688 Ho, Jia Qian, 795 Hobbs, Jerry R., 1088 Hoberg, Ursula, 399 Hockett, Charles F., 50, 51, 957, 958, 968 Hoeksema, Jack, 1355 Hofmeister, Philip, 70, 674–676, 681, 683, 688, 736, 1093, 1258, 1359 Höhle, Tilman N., 9, 93, 97, 349, 370, 410, 442, 1292, 1317, 1506, 1518, 1525 Holler, Anke, 634 Holmberg, Anders, 1302 Hopcroft, John E., 1107 Horn, George M., 780 Horn, Laurence, 666 Hornstein, Norbert, 494, 514, 1005, 1261, 1302, 1304 Horvat, Matic, 1127 Horvath, Julia, 604 Hough, Julian, 1182, 1230 Howard, William A., 1347, 1353, 1429 Hsiao, Franny, 1095 Huang, Cheng-Teh James, 682, 684, 690 Huck, Geoffrey J., 681 Huddleston, Rodney, 560, 602, 640, 682, 683, 728, 742, 753, 760,

763, 1269 Hudson, Richard, 277, 755, 762, 874, 1112, 1161, 1303, 1450, 1458– 1460, 1463, 1464, 1466, 1468, 1469, 1471, 1480, 1482, 1533 Hukari, Thomas E., 30, 165, 257, 379, 513, 544, 548–552, 666, 668, 671, 675, 676, 683, 684, 689, 691, 692, 695, 762, 910–912, 914, 915, 923, 1358, 1359 Humberstone, Lloyd, 1340 Hunter, Julie, 1182, 1227 Hunter, Tim, 1503 Hust, Joel, 493 Hutchins, Sean, 681, 686 Huybregts, Riny, 1256 Iida, Masayo, 513, 910 Imrényi, András, 1453, 1454 Ingria, Robert J. P., 257, 258 Iordăchioaia, Gianina, 1019, 1033 Ishihara, Roberta, 603 Ishikawa, Akira, 1399 Itakura, Shoji, 1209 Ivanova, Angelina, 1113, 1114 Jackendoff, Ray S., 9, 21, 57, 68, 128, 130, 135, 165, 276, 334, 349, 492, 700, 737, 738, 780, 813, 848, 849, 855, 856, 860, 890, 893, 920, 926, 1001, 1005, 1045, 1049, 1162, 1255–1257, 1273, 1275, 1276, 1290, 1475, 1500, 1521, 1526, 1530 Jacobs, Joachim, 1052 Jacobson, Pauline, 384, 492, 500, 700, 754, 851, 852, 1340, 1366, 1367, 1370, 1430, 1431, 1523 Jacques, Guillaume, 190

Jaeger, T. Florian, 1093 Jäger, Gerhard, 1107, 1345–1347, 1366 Jaggar, Philip J., 546, 563 Jakob, Hanna, 1232 Janda, Richard D., 1355 Jelinek, Eloise, 187 Jespersen, Otto, 1452 Jezek, Elisabetta, 128, 142 Jiménez–Fernández, Ángel, 683, 688 Job, Remo, 680 Johannessen, Janne Bondi, 730, 1288, 1289 Johansson, Gunnar, 1222, 1223 Johnson, David E., 570, 571, 728, 1161, 1183 Johnson, Kyle, 1361, 1364 Johnson, Mark, 259, 1108, 1357, 1525 Johnston, Michael, 1213, 1231 Jones, Bob Morris, 813, 832, 834, 835, 838, 1288 Joos, Martin, 1305 Josefsson, Gunlög, 233, 234 Joshi, Aravind K., 53, 1108, 1111, 1112, 1527 Jurafsky, Daniel, 128, 161, 952, 965, 970, 971, 1527 Jurka, Johannes, 682 Just, Marcel Adam, 1096 Kahane, Sylvain, 1452 Kallmeyer, Laura, 1506 Kamide, Yuki, 676, 1085 Kamp, Hans, 1003, 1161, 1427 Kandybowicz, Jason, 699 Kanerva, Jonni M., 1415 Kaplan, David, 1204 Kaplan, Ronald M., 263, 1019, 1253, 1299, 1332, 1401, 1418, 1427, 1503, 1506

Karimi, Simin, 513 Karimi=Doostan, Gholamhossein, 470 Karttunen, Lauri, 761 Kasami, Tadao, 51, 1108 Kasper, Robert T., 60, 378, 1002, 1009–1011, 1283 Kasper, Walter, 5, 63, 65, 1116 Kathol, Andreas, 32, 33, 52, 133, 226, 234, 235, 238, 263, 349, 384, 388, 390, 395–399, 424, 450, 455, 526, 569, 570, 575, 637, 638, 646, 834, 912, 1161, 1164, 1356, 1449, 1471 Kato, Yasuhiko, 825 Katzir, Roni, 674 Kauffman, Lynn E., 1202 Kay, Martin, 220, 1018, 1019, 1109, 1120, 1396, 1409 Kay, Paul, 6, 165, 300, 514, 631, 793, 794, 796, 875, 1353, 1498, 1499, 1501, 1506, 1507, 1510, 1511, 1515, 1521 Kayne, Richard S., 599, 615, 667, 678, 679, 682, 694, 730, 750, 926, 1089, 1288, 1290, 1536 Keenan, Edward L., 58, 190, 200, 201, 322, 379, 895, 896, 1010, 1033, 1094, 1369, 1432, 1483 Kehler, Andrew, 670, 671, 728, 744, 858 Kelepir, Meltem, 825 Keller, Frank, 158, 372, 395, 572–574, 577–579, 638, 676, 678, 1263 Kellogg, Brainerd, 1452 Kelly, Spencer D., 1206 Kemper, Susan, 687 Kempson, Ruth, 1157, 1158, 1170

Kendon, Adam, 1157, 1201, 1202, 1209, 1211, 1228 Kennedy, Christopher, 1366, 1367 Kern, Franz, 1453, 1455 Keyser, Samuel Jay, 336, 357, 422, 1274, 1275 Kibort, Anna, 1415 Kibrik, Aleksandr E., 547 Kiefer, Bernd, 66 Kihm, Alain, 957 Kikuta, Chiharu Uda, 641, 642 Kim, Christina S., 858, 859 Kim, Jin-Dong, 1125 Kim, Jong-Bok, 30, 64, 65, 158, 177, 263, 300, 306, 307, 352, 386, 391, 424, 450, 461–467, 469, 502, 522, 523, 525, 526, 610, 627–629, 636, 642–644, 646, 647, 678, 700, 752, 756, 763, 812–823, 826, 828, 829, 831– 833, 835, 838, 848, 852–855, 860, 864, 865, 868–870, 875, 1276, 1361, 1366–1368, 1450, 1515 Kim, Su Nam, 777–779, 781 Kim, Yong-Beom, 642 Kimball, John, 685 King, Adriana, 666 King, Jeffrey C., 1169 King, Jonathan, 1096 King, Paul, 60, 91, 98, 116, 117, 162, 401, 784, 1168 King, Tracy Holloway, 226, 254, 1120, 1122, 1128, 1506 Kiparsky, Paul, 971, 1415 Kipp, Michael, 1229, 1230 Kiss, Tibor, 32, 253, 287, 382, 386, 395, 424, 442, 450, 526, 575–

577, 579, 611, 638, 639, 905, 925, 1295, 1296, 1363, 1502 Kita, Sotaro, 1202, 1211, 1232 Klein, Ewan, 3, 9, 48, 55, 93, 371, 540, 782, 1001, 1162, 1220, 1253, 1357, 1503, 1518 Klein, Wolfgang, 896, 926, 1156, 1158, 1218 Klima, Edward S., 816, 818 Klimov, Georgij A., 189 Kluender, Robert, 672–676, 683, 685, 688, 690, 1096, 1359 Kobele, Gregory M., 1258 Koch, Peter, 1156 Koenig, Jean-Pierre, 7, 13, 15, 16, 19– 21, 27, 61, 93, 97, 128, 130, 138, 145, 157, 161, 163–165, 178, 181, 186, 187, 190, 193, 194, 202, 220, 222, 224, 282, 316, 321, 328–332, 336, 337, 339, 340, 342, 344, 345, 348, 352, 354, 380, 383, 423, 424, 479, 499, 502, 506, 510, 540, 577, 611, 615, 638, 641, 733, 784, 788, 796, 818, 821, 838, 871, 890, 895, 896, 905–908, 915, 920, 927, 947–949, 952, 954, 960–965, 970–972, 985, 987, 990, 991, 1002, 1003, 1005, 1012, 1013, 1046, 1092, 1113, 1121, 1168, 1169, 1221, 1254, 1264, 1284, 1293, 1306, 1352, 1355, 1396, 1415–1418, 1427, 1465, 1483, 1484, 1498, 1506, 1512, 1520, 1521, 1527 Kohrt, Annika, 691 Kolarova, Zornitza, 1210 Kolb, Hans-Peter, 1502

Kolliakou, Dimitra, 1064, 1065 Kong, Anthony Pak-Hin, 1232 Konieczny, Lars, 1081 Konietzko, Andreas, 1045 König, Esther, 1503 Kopp, Stefan, 1214 Kordoni, Valia, 66 Kornai, András, 1304 Koster, Jan, 677, 1293 Kothari, Anubha, 666 Kouloughli, Djamel-Eddine, 1451 Kouylekov, Milen, 1123 Kraak, Esther, 422, 1356 Krahmer, Emiel, 1217 Kranstedt, Alfred, 1210, 1217, 1218, 1231 Kratzer, Angelika, 1001, 1160, 1431, 1501 Krauss, Michael, 179 Krauss, Robert M., 1202, 1232 Krenn, Brigitte, 785, 787, 789, 798 Krer, Mohamed, 826 Krieger, Hans-Ulrich, 949, 950, 959, 960, 1117, 1118 Krifka, Manfred, 1052, 1053, 1161 Kripke, Saul A., 1160 Kroch, Anthony, 665, 676, 696 Kroeger, Paul R., 198, 512, 1425 Kropp Dakubu, Mary Esther, 179, 1405 Kruijff-Korbayová, Ivana, 1044 Kruyt, Johanna G, 1118 Kübler, Sandra, 1458 Kubota, Yusuke, 141, 263, 287, 599, 646, 671, 872, 890, 1019, 1108, 1161, 1332, 1344, 1348– 1350, 1352, 1355, 1356, 1359– 1365, 1367, 1368, 1430, 1497

Kuhlmann, Marco, 1342 Kuhn, Jonas, 1282 Kuhnle, Alexander, 66, 1128 Kühnlein, Peter, 1218 Kuiper, Koenraad, 798 Kuno, Susumu, 513, 666, 671, 673, 674, 684, 685, 688, 690, 898 Kupffer, Manfred, 1204 Kupść, Anna, 65, 813, 826, 835, 838 Kuroda, Sige-Yuki, 904 Kutas, Marta, 675, 676, 687, 1096 Kynette, Donna, 687 Ladd, Robert D., 694 Ladewig, Silva, 1202 Ladusaw, William A., 334, 1004 Lahm, David, 155, 156, 1033 Laka, Itziar, 185 Lakoff, George, 128, 670, 671, 728, 744, 1231, 1303, 1510 Lambek, Joachim, 59, 60 Lambrecht, Knud, 6, 641, 796, 1257, 1478 Landau, Ian, 494 Landman, Fred, 334, 637 Langacker, Ronald W., 906, 1510 LaPolla, Randy J., 322, 346, 493 Laporte, Éric, 779 Lappin, Shalom, 570, 571, 673, 674, 728, 1161, 1183 Larson, Richard K., 1285, 1288 Larsson, Staffan, 860, 1161 Lascarides, Alex, 24, 150–152, 813, 818, 1002, 1182, 1212, 1218, 1220, 1460, 1462 Lasnik, Howard, 49, 679, 682, 687, 812, 1256, 1374 Lau, Ellen, 676 Laubsch, Joachim, 65

Laughren, Mary, 394 Laurens, Frédéric, 506, 509, 513, 516, 521, 522, 634 Lausberg, Hedda, 1230 Law, Vivien, 1451 Le, Quoc Anh, 1231 Lee, Hanjung, 853 Lee, Jungmee, 671 Lee, Juwon, 179, 423 Lee, Sun-Hee, 424, 464, 469 Lee-Goldman, Russell, 636, 1164 Lees, Robert B, 615 Lenerz, Jürgen, 1292 LeSourd, Philip S., 236, 237 Letcher, Ned, 1107 Leung, Tommi, 854 Levelt, 1232 Levesque, Hector J., 1163 Levin, Beth, 316, 327, 332, 338, 342, 344, 353, 1483 Levin, Nancy S., 670, 728, 744 Levine, Robert D., 30, 64, 165, 256– 258, 263, 379, 523, 544, 548– 552, 617, 666, 668, 670, 671, 675, 676, 683, 684, 689, 691, 692, 695, 728, 744, 745, 755, 762, 894, 910–912, 914, 915, 923, 1169, 1297, 1332, 1344, 1348–1350, 1358–1365, 1367, 1368 Levinson, Stephen C., 1157, 1217, 1374 Levy, Leon S., 53 Levy, Roger, 259, 260, 262, 676, 680, 1358 Lewis, David, 1164, 1212 Lewis, M. Paul, 179 Lewis, Mike, 1375

Lewis, Richard L., 686 Lewis, William D., 1122 Li, Yen-hui Audrey, 1259, 1260 Liang, Tie, 66 Lichte, Timm, 1506 Lieber, Rochelle, 949 Lin, Chien-Jer Charles, 1095 Lin, Dekang, 1257 Linardaki, Evita, 701, 1083 Linell, Per, 1157 Link, Godehard, 395, 611, 1363 Lipenkova, Janna, 66, 177, 179, 912, 1115 Lipták, Anikó, 855, 856 Lobeck, Anne, 868 Lødrup, Helge, 798 Loehr, Daniel, 1201, 1205 Lohmann, Arne, 743 Lohnstein, Horst, 1293, 1502 Lønning, Jan Tore, 63, 1112 López, Luis, 868 Lucas, Christopher, 640 Lücking, Andy, 93, 862, 930, 1157, 1162, 1164, 1172, 1173, 1202– 1205, 1207, 1212, 1214, 1217, 1218, 1221, 1222, 1224, 1228– 1231, 1233, 1299 Luhtala, Anneli, 1451 Luo, Zhaohui, 1357 Luutonen, Jorma, 981 MacDonald, John, 1087 MacDonald, Maryellen C., 1082, 1087, 1090, 1091, 1094 Machicao y Priemer, Antonio, 165, 250, 277, 295, 381, 615, 908, 912, 923 Macken, Elizabeth E., 66 MacKinlay, Andrew, 1125

Maekawa, Takafumi, 304, 306 Mak, Willem M., 676 Malchukov, Andrej, 245 Maling, Joan, 247, 254, 255 Malouf, Robert, 5, 65, 67, 151–155, 179, 254, 297–299, 325, 509, 544, 893, 902, 908, 911, 949, 1002, 1019, 1109, 1118, 1482 Manandhar, Suresh, 65, 1045, 1048, 1065 Manning, Christopher D., 16, 66, 181, 318, 322, 325, 345, 347, 421– 423, 439, 441, 509, 510, 546, 890, 910, 915, 917–920, 1352, 1396, 1405, 1458 Marandin, Jean-Marie, 391 Marantz, Alec, 126, 128, 130, 141 Maratsos, Michael, 1094, 1096 Marciniak, Małgorzata, 65 Marcus, Mitchell P., 1119, 1123 Marcus, Ruth Barcan, 1160 Marimon, Montserrat, 65, 177, 1118, 1123 Markantonatou, Stella, 795, 796 Marneffe, Marie-Catherine de, 1458 Marrafa, Palmira, 908 Marshall, Catherine R., 1164 Marslen-Wilson, William D., 1086, 1300 Martell, Craig, 1229 Martin, Scott, 1336 Martin-Löf, Per, 1172, 1335, 1357 Masataka, Nobuo, 1217 Mateu, Jaume, 795 Matiasek, Johannes, 191, 246, 249– 251 Matsuki, Kazunaga, 1087 Matsuyama, Tetsuya, 1290, 1530–

1532 Matsuzaki, Takuya, 1112, 1118, 1119 Matthews, Danielle, 1217 Matthews, Peter H., 955, 958, 1457 Mauner, Gail, 336, 348 May, Robert, 674, 684, 1005, 1265, 1431 McCawley, James D., 132, 673, 760, 787, 1274, 1364 McClave, Evelyn, 1205 McCloskey, James, 231, 232, 563, 1297, 1359 McGinn, Colin, 1217 McGinnis, Martha, 495 McGurk, Harry, 1087 Mchombo, Sam A., 128, 130, 227, 827, 1413, 1414 McMurray, Bob, 1086 McNeill, David, 1157, 1201, 1202, 1208–1210, 1215, 1217, 1226, 1227, 1229, 1232 Megerdoomian, Karine, 470 Mehler, Alexander, 1217, 1221, 1228 Meinunger, André, 1260 Meisel, Jürgen M., 1504 Mel'čuk, Igor A., 1458 Melnik, Nurit, 66, 620, 633, 1115, 1117 Merchant, Jason, 700, 851, 853, 855– 857, 860, 863, 864, 869, 1366 Merenciano, Josep-Maria, 1366 Meurers, Walt Detmar, 10, 20, 65, 66, 155, 156, 181, 189, 253, 345, 386, 402, 424, 442, 450, 457, 526, 677, 948, 1044, 1052– 1055, 1060–1063, 1066–1069, 1115, 1169, 1258, 1296, 1502, 1525, 1527 Michaelis, Laura A., 6, 816, 818, 821,

869, 1257, 1498, 1501, 1507 Michel, Daniel, 676 Michelson, Karin, 178, 186, 187, 202, 354, 965, 1306, 1483 Mihaliček, Vedrana, 1356 Miller, Philip, 16, 136, 158, 160, 161, 181, 252, 321, 347, 352, 424, 428, 525, 761, 823, 850, 851, 856–860, 865, 868–870, 959, 1258, 1366, 1511 Milward, David, 762 Mineshima, Koji, 1336, 1355, 1375 Minsky, Marvin, 57 Mitchell, Erika, 827 Mithun, Marianne, 189 Miyamoto, Edson T, 1095 Miyao, Yusuke, 65, 1083, 1119, 1125, 1511 Moeljadi, David, 66 Molimpakis, Emilia, 854 Momma, Shota, 1082 Monachesi, Paola, 16, 158, 159, 188, 252, 423, 424, 430–435, 479, 833 Montague, Richard, 13, 49, 1013, 1172, 1427, 1431 Moore, John, 325, 341 Moortgat, Michael, 59, 1332, 1333, 1344, 1352, 1355, 1356, 1359, 1428 Moosally, Michelle J., 749 Moot, Richard, 1344, 1375 Morante, Roser, 1126 Morgado da Costa, Luis, 1125 Morgan, Jerry, 852, 853 Morrill, Glyn, 59, 754, 1332, 1333, 1335, 1344, 1347, 1356, 1357, 1359, 1360, 1364–1366, 1375,

1428 Moshier, M. Drew, 60 Mouret, François, 730, 732, 734, 736, 756–759, 763, 872, 876 Muansuwan, Nuttanart, 179, 1005, 1284 Mulder, Jean Gail, 200 Mullen, Tony, 278 Müller, Cornelia, 1202, 1208, 1211 Müller, Gereon, 572, 793, 1290, 1304, 1501, 1533 Müller, Stefan, 5–8, 13–15, 17, 23, 27, 32–34, 52, 58, 61, 63– 70, 91, 130, 133, 138–140, 156, 158, 159, 165, 177–179, 181, 182, 189, 191, 196, 199, 203, 222–225, 238, 246, 250, 253, 254, 277, 278, 295, 307, 316, 322, 345, 347, 349–351, 353, 355, 356, 372, 376, 379– 381, 383, 384, 386–391, 395– 397, 399–402, 424, 442, 443, 446, 450–452, 455–460, 462, 466, 473, 474, 491, 493, 497, 499, 509, 510, 515, 518, 522, 524, 526, 544, 550, 558–560, 563, 570, 572–575, 578, 579, 582, 596, 599, 603, 605, 609, 615, 616, 627, 631, 633, 636– 639, 644, 646, 677, 678, 730, 749, 753, 757, 782, 783, 788, 796, 798, 813, 817, 848, 853, 874, 875, 877, 896, 900, 903– 905, 908, 910, 912, 915, 922– 925, 933, 951, 954, 969, 990, 1003, 1016, 1018, 1063, 1068, 1069, 1091, 1106–1108, 1115, 1116, 1119–1121, 1129, 1158,

1160, 1161, 1254, 1256, 1258, 1261, 1262, 1264, 1268–1270, 1282–1284, 1286, 1288, 1290, 1291, 1293, 1296, 1298, 1299, 1305, 1331, 1334, 1352–1356, 1360, 1366, 1368, 1373, 1396, 1397, 1407, 1415, 1423, 1424, 1427, 1430, 1448–1450, 1459, 1460, 1462, 1464, 1465, 1471, 1472, 1474, 1475, 1477, 1483, 1497, 1498, 1502, 1503, 1505– 1507, 1510–1513, 1515, 1516, 1520, 1522–1528, 1535 Munn, Alan, 671 Muskens, Reinhard, 1349, 1356 Musso, Mariacristina, 1304 Muszyńska, Ewa, 1118 Mykowiecka, Agnieszka, 65, 616 Na, Younghee, 681 Nakamura, Michiko, 1095 Nakazawa, Tsuneko, 352, 380, 390, 420, 424, 442, 443, 450, 452, 453, 455, 457, 526, 603, 619, 646, 1296 Nanni, Debbie L., 602, 603 Neeleman, Ad, 691 Nerbonne, John, 65, 278, 283, 637, 949, 950, 959, 960, 1002, 1019, 1171, 1525 Netter, Klaus, 278, 284–287, 386 Neu, Julia, 66 Neumann, Günter, 701 Neumann, Werner, 399 Newman, Paul, 545, 565 Newmeyer, Frederick J., 49, 677, 782, 813, 1267, 1274, 1301, 1302, 1359 Ni, Weijia, 676

Nichols, Johanna, 189, 1398 Nishigauchi, Taisuke, 673 Nivre, Joakim, 1119, 1123 No, Yongkyoon, 465 Noh, Bokyung, 1524 Nordlinger, Rachel, 949, 1403, 1472 Nöth, Winfried, 1202 Noyer, Rolf, 971, 973 Nunberg, Geoffrey, 778, 780, 782, 783, 785, 789, 791, 794, 797, 1157, 1164, 1219 Nunes, Jairo, 1261 Núñez, Rafael, 1209 Nykiel, Joanna, 263, 391, 502, 523, 525, 636, 700, 752, 756, 763, 848, 854, 863, 864, 870, 1361, 1366–1368 O'Connor, Mary Catherine, 1498, 1499, 1501, 1506 Oehrle, Richard T., 59, 1332, 1333, 1348, 1349, 1352, 1356, 1364 Oepen, Stephan, 5, 65, 66, 68, 1110, 1112, 1116, 1117, 1123, 1126, 1127 Oesterreicher, Wulf, 1156 Oliva, Karel, 380, 386 Oostdijk, Nelleke, 1118 Oppenheimer, Daniel M., 1217 Ørsnes, Bjarne, 66, 177, 182, 381, 386, 915, 1115, 1293, 1498 Osborne, Timothy, 1450 Osenova, Petya, 66 Oshima, David Y., 665 Ott, Dennis, 646, 1270 Owens, Jonathan, 1451 Özyürek, Aslı, 1201, 1202, 1210, 1232 Packard, Woodley, 66, 68, 1117, 1126

Paggio, Patrizia, 1044, 1046, 1048, 1055, 1056, 1058, 1059 Park, Byung-Soo, 646 Park, Myung-Kwan, 682 Park, Sang-Hee, 733, 763, 764, 871, 1355, 1364–1366 Partee, Barbara H., 675, 890, 894, 900, 1004, 1161, 1284, 1339, 1362 Patejuk, Agnieszka, 263 Paul, Hermann, 915 Payne, John R., 811, 827 Peacocke, Christopher, 1222 Peldszus, Andreas, 1020 Penn, Gerald, 65, 66, 569, 570, 1115, 1169, 1170, 1428 Percival, Keith, 1453, 1456, 1494 Perles, Micha A., 1503 Perlmutter, David M., 322, 345, 348, 349, 421, 431, 514, 698, 916, 1363, 1404, 1458, 1482 Perry, John, 13, 1002, 1160, 1172, 1176, 1427 Pesetsky, David, 191, 675, 693, 694, 728 Peters, Stanley, 50, 56 Petrick, Stanley Roy, 1120 Petten, Cyma Van, 687 Pfeiffer, Thies, 1202, 1231 Philippova, Tatiana, 854 Phillips, Colin, 676, 687, 689, 760, 1082, 1093, 1299, 1300 Pickering, Martin, 1089–1092, 1375 Pietroski, Paul, 1257, 1505 Piirainen, Elisabeth, 797 Pike, Kenneth, 1457 Pinker, Steven, 147, 330, 339, 342, 1500

Pittner, Karin, 896 Plank, Frans, 189 Poesio, Massimo, 1157, 1158, 1182, 1207, 1212, 1229 Pogodalla, Sylvain, 1335 Polinsky, Maria, 196, 236, 682 Pollard, Carl, v, 3, 4, 8, 12, 13, 17, 20, 29, 31, 32, 34, 35, 48, 51, 55–62, 65, 89–94, 97–100, 103, 113, 117–120, 138, 140– 142, 158, 161, 165, 198, 200, 222, 224–226, 232, 234, 246, 247, 249–252, 259, 260, 262, 277–280, 282, 283, 291, 296, 304, 317, 318, 332, 345, 371, 377, 380, 382, 390, 395, 396, 456, 466, 477, 492, 493, 496, 500, 502, 512–514, 517–519, 527, 540, 541, 544, 545, 547, 548, 551, 556, 568, 569, 574, 575, 596, 599–601, 604, 606– 610, 612, 615, 617, 620, 621, 630, 634, 637–640, 642, 668, 670, 672, 673, 683, 690, 729, 731, 733, 734, 739, 745, 751, 783–786, 890, 891, 894–896, 898, 900–905, 908, 909, 912, 923, 924, 927, 933, 947, 964, 1001–1010, 1023, 1046, 1048, 1051, 1084, 1089, 1096, 1105, 1107, 1158, 1159, 1161, 1168, 1170, 1171, 1175, 1176, 1253, 1264, 1267, 1280, 1285, 1286, 1293, 1296, 1297, 1301, 1303, 1305, 1349, 1352, 1356, 1358, 1360, 1361, 1368, 1370, 1373, 1401, 1405, 1418, 1425–1427, 1450, 1458, 1480, 1497, 1498,

1506, 1508, 1512, 1516–1518, 1520, 1534, 1535 Pollock, Jean-Yves, 384, 812, 815, 823, 1276 Pompigne, Florent, 1335 Poornima, Shakthi, 423, 424 Posner, Roland, 1202, 1228 Postal, Paul M., 5, 322, 345, 490, 494, 671, 673, 679, 683, 763, 1254, 1256, 1264, 1273, 1274, 1363, 1458, 1482 Potsdam, Eric, 236 Potts, Christopher, 636 Poulson, Laurie, 66 Poyatos, Fernando, 1201 Prince, Alan, 5, 971 Prince, Ellen F., 670, 728, 744 Pritchett, Bradley L., 667 Proudian, Derek, 56, 65 Przepiórkowski, Adam, 30, 65, 138, 181, 190, 194, 246, 252–255, 259, 263, 380, 383, 402, 504, 506, 813, 826, 835, 836, 838, 910, 951, 1009, 1263, 1357, 1358, 1396, 1483, 1528 Pullum, Geoffrey K., 3, 5, 48–50, 55, 92, 256, 371, 379, 523, 540, 560, 602, 640, 727, 782, 858, 859, 868, 896, 1108, 1253, 1255, 1256, 1263, 1264, 1304, 1503 Pulman, Stephen G., 787 Purver, Matthew, 860, 1167, 1168, 1171, 1172, 1175 Pustejovsky, James, 128, 142 Puthawala, Daniel, 1365 Putnam, Michael T., 70, 493, 666, 697, 700, 1093, 1298, 1301, 1303,

Quine, Willard Van Orman, 1219 Quirk, Randolph, 297 Radford, Andrew, 1261, 1288 Raffelsiefen, Renate, 1355 Ramchand, Gillian, 1284 Ranta, Aarne, 1120, 1122, 1172, 1335 Rappaport Hovav, Malka, 316, 327, 342, 344, 353 Rayner, Keith, 686 Reali, Florencia, 1094 Reape, Mike, 17, 33, 52, 133, 182, 238, 263, 287, 371, 377, 384, 455, 462, 464, 570, 575, 636, 749, 1063, 1106, 1258, 1283, 1355, 1449, 1471, 1474, 1511 Reed, Alonzo, 1452 Reid, Nicholas, 228, 229 Reinhart, Tanya, 890–892, 926, 928 Reintges, Chris H., 549, 571, 620 Reiplinger, Melanie, 1125 Reis, Marga, 318, 370, 400, 1404 Reisberg, Daniel, 1459 Rentier, Gerrit, 424, 450, 1109 Retoré, Christian, 68, 1344 Reuland, Eric, 890, 926 Reyle, Uwe, 927, 1003, 1019, 1161, 1427 Richards, Marc, 1272 Richards, Norvin, 795 Richter, Frank, 4, 6–8, 12, 13, 19, 27, 61, 65, 91, 93, 97, 99, 104, 113, 117, 119–121, 134, 135, 143, 145, 161, 162, 224, 260, 278, 282, 339, 401, 499, 502, 577, 611, 638, 733, 783, 784, 788, 790–792, 818, 838, 871, 927,

1359

1002, 1012, 1019, 1033, 1034, 1046, 1051, 1088, 1107, 1109, 1113, 1168, 1169, 1171, 1221, 1255, 1260, 1264, 1303, 1355, 1396, 1410, 1427, 1428, 1465, 1497, 1507, 1508, 1511, 1520 Richter, Stephanie, 666 Riehemann, Susanne Z., 157, 165, 786, 788–791, 947–951, 954, 990, 1002, 1019, 1527–1529 Riemsdijk, Henk van, 1256 Rieser, Hannes, 1182, 1207, 1212, 1218, 1221, 1222, 1229, 1233 Riezler, Stefan, 890, 912 Ristad, Eric Sven, 56 Ritchart, Amanda, 699 Ritchie, Robert W., 50 Rizzi, Luigi, 384, 387, 431, 436, 495, 1280–1282, 1305, 1504, 1536 Roach, Kelly, 51 Roberts, Craige, 763 Roberts, Ian, 1288, 1302 Roberts, Taylor, 132 Robins, Robert, 1451, 1452 Rochemont, Michael S., 577, 678 Rodrigues, Cillene, 854 Roeper, Thomas, 336 Rogers, Andy, 514 Rogers, James, 1107 Rohde, Douglas L.T., 1123 Roland, Douglas, 676, 1092, 1094 Rooth, Mats, 1060, 1339 Röpke, Insa, 1232 Rosch, Eleanor, 1460 Rose, Miranda L., 1232 Rosen, Alexandr, 504 Rosenbaum, Peter S., 490 Ross, John Robert, 522, 537, 552, 572,

573, 598, 639, 667, 670, 672, 673, 678, 680, 682, 684, 696, 698, 728, 731, 850, 851, 1092, 1093, 1279, 1360, 1363 Ross, Malcolm, 189 Rosta, Andrew, 1458 Rounds, William C., 60 Rouveret, Alain, 1288, 1297 Runner, Jeffrey T., 127, 179, 347, 697, 859 Ruwet, Nicolas, 514, 780, 782, 783, 797 Ruys, Eddy, 674 Ryu, Byong-Rae, 264, 424, 469 Saah, Kofi, 673 Sabbagh, James, 674, 760 Sacks, Harvey, 1157, 1181 Sadler, Louisa, 300, 746, 749, 750, 949, 1288 Sag, Ivan A., v, 3–9, 12, 13, 16–18, 20, 23, 25, 29–33, 35, 48, 51, 55, 57, 60–64, 69, 70, 89– 94, 98–100, 103, 113, 117, 118, 120, 130, 136, 138, 140, 142, 151, 158, 160, 161, 165, 178, 181–183, 198, 222, 224–226, 232, 234, 237, 246, 247, 249, 252, 260, 262, 263, 265, 277– 283, 291, 295–297, 299, 300, 302, 304, 307, 317–319, 321, 332, 345, 347, 352, 371, 376, 377, 381, 386, 387, 390, 391, 394, 395, 402, 424, 427, 428, 432, 433, 441, 466, 477, 478, 492, 493, 495, 496, 500, 502, 509, 510, 512–514, 517–519, 522–527, 540, 541, 544–551, 553, 556–558, 560, 567–570,

574, 583, 596, 599–601, 603– 615, 617, 618, 620–622, 628, 630–632, 634–636, 639, 640, 642, 668, 670, 672–676, 678, 683, 688–690, 692–695, 697, 699–701, 729–731, 733, 734, 739, 740, 742, 743, 745, 747, 751, 752, 754, 756, 757, 761, 779, 780, 782–786, 788, 792, 799, 813–819, 837, 838, 847, 849, 854, 857, 859–865, 869, 871, 873–877, 890, 891, 893– 896, 898, 900–905, 908–912, 915, 917–920, 923, 924, 927, 933, 947, 959, 964, 1001– 1007, 1009, 1010, 1013, 1014, 1016, 1017, 1019, 1023, 1046, 1048, 1051, 1081, 1089, 1093, 1095, 1096, 1107, 1109, 1158, 1159, 1161, 1168, 1170, 1171, 1175, 1176, 1183, 1185, 1253, 1255–1258, 1264, 1267, 1277, 1278, 1282, 1285, 1286, 1288, 1289, 1293, 1296, 1297, 1299, 1301, 1303, 1352, 1353, 1355, 1359–1368, 1370, 1373, 1374, 1396, 1397, 1401, 1407, 1418, 1423, 1425–1427, 1450, 1457, 1458, 1480, 1482, 1497, 1498, 1501, 1503, 1506–1521, 1534, 1535 Sailer, Manfred, 6, 13, 611, 638, 783, 791, 794–796, 838, 905, 1002, 1019, 1033, 1034, 1048, 1171, 1264, 1355, 1484, 1501, 1508 Saito, Mamoru, 682, 687 Saleem, Safiyyah, 66, 180 Salem, Maha, 1231

Salkoff, Morris, 1257 Salomon, Ralf, 1231 Salvi, Giampaolo, 438 Samvelian, Pollet, 14, 177, 352, 353, 373, 424, 469–472, 475–478, 499, 519, 526, 752, 798, 959, 964, 983, 1109, 1352, 1450 Sandfeld, Kristian, 641 Sandøy, Mads H., 66 Sanfilippo, Antonio, 348 Santorini, Beatrice, 682 Sauerland, Uli, 615, 683, 1265, 1296 Schabes, Yves, 1108, 1507, 1527 Schachter, Paul, 189, 615, 859 Schäfer, Ulrich, 66, 1116, 1118, 1125 Schegloff, Emanuel A., 1181 Schein, Barry, 334 Schenk, André, 780, 797 Schiel, Florian, 1230 Schlangen, David, 1020 Schlenker, Philippe, 1212, 1221, 1227 Schmeh, Katharina, 856 Schmerling, Susan, 670 Schmidt, Paul, 308 Schmidt, Thomas, 1229 Schmolze, James G., 142 Scholz, Barbara C., 5, 92, 1255, 1263, 1264 Schuster, Sebastian, 1126 Schütze, Carson T., 1258 Schwarz, Florian, 906 Schwarzschild, Roger, 1060 Scott, Dana S., 60 Searle, John R., 1015 Sedivy, Julie C., 1182 Seifart, Frank, 228 Seki, Hiroyuki, 51 Sekine, Kazuki, 1232

Selkirk, Elizabeth O., 1045, 1054, 1067 Sells, Peter, 300, 306, 424, 450, 461, 561, 610, 822, 1515 Sériot, Patrick, 1453 Sgall, Petr, 1458 Shabes, Yves, 51 Shamir, Eliahu, 1503 Shan, Chung-chieh, 1333, 1354 Sheinfux, Livnat Herzig, 798 Shieber, Stuart M., 50, 1397, 1410 Shiraïshi, Aoi, 70, 761, 874–876, 1258, 1367, 1368 Siegel, Melanie, 63, 65, 177, 1111, 1116 Siegel, Muffy A., 1364, 1365 Siewierska, Anna, 180, 181, 190, 1458 Simons, Gary F., 179 Simpkins, Neil, 65 Simpson, Jane, 394 Sirai, Hidetosi, 627, 628, 632 Situation Semantics, 13 Slama-Cazacu, Tatiana, 1202 Slayden, Glenn C., 1117 Sloetjes, Han, 1229, 1230 Smolensky, Paul, 5, 971 Snider, Neal, 666, 677, 680, 1093 Snijders, Liselotte, 1420 Soehn, Jan-Philipp, 792 Solberg, Lars Jørgen, 1124 Solias, Teresa, 1364 Sonesson, Göran, 1222 Song, Sanghoun, 177, 1044, 1047, 1048, 1050, 1051, 1056–1060, 1073, 1282 Sorace, Antonella, 1263 Sowa, Timo, 1222 Spector, Benjamin, 666 Spencer, Andrew, 245, 307, 958, 982

Spevack, Samuel C., 1086, 1087 Sportiche, Dominique, 892 Sproat, Richard, 1288 Sprouse, Jon, 675, 676, 682, 688, 1093 Spurk, Christian, 66 Srivastav, Veneeta, 599 Staal, Johan F., 1415 Stabler, Edward, 67, 68, 1106, 1257, 1258, 1503 Stalnaker, Robert, 1164 Stanojevic, Milos, 1258 Starke, Michael, 1304 Stassen, Leon, 516 Staub, Adrian, 676, 679 Steedman, Mark, 3, 51, 53, 371, 377, 599, 1001, 1019, 1044, 1087, 1108, 1258, 1289, 1332, 1333, 1337, 1338, 1340, 1342, 1343, 1352, 1354, 1355, 1358, 1361, 1364, 1370–1375 Steels, Luc, 1217, 1510 Steffen, Jörg, 66 Stepanov, Arthur, 682 Sternefeld, Wolfgang, 1260, 1261 Stiebels, Barbara, 1528 Stillings, Justine T., 602, 603 Stjepanović, Sandra, 854 Stokhof, Martin, 1014 Stone, Matthew, 1218, 1220 Stowell, Timothy, 515, 678 Streeck, Jürgen, 1208 Stroop, John Ridley, 1087 Strunk, Jan, 666, 677, 680 Stucky, Susan U., 679, 680 Stump, Gregory T., 14, 163, 958, 959, 963, 964, 966–968, 971, 975, 977, 979, 980, 987, 1355 Sugayama, Kensei, 1458

Suñer, Margarita, 438 Suppes, Patrick, 66, 1125 Svartvik, Jan, 1165 Sweet, Henry, 1452 Swingle, Kari, 761 Swinney, David A., 794 Sygal, Yael, 1106 Szabolcsi, Anna, 665, 1369 Szczegielniak, Adam, 854 Tabor, Whitney, 676, 681, 686 Taghvaipour, Mehran A, 177, 561, 566, 617, 646, 1297 Takahashi, Masako, 53 Takami, Ken-ichi, 684, 688, 690 Tallerman, Maggie, 1469 Tam, Wai Lok, 263, 391, 874, 1355, 1357, 1362, 1363 Tanenhaus, Michael K., 1085, 1086, 1300 Taraldsen, Knut T., 673 Taylor, John, 1460 Tegey, Habibullah, 132 Tesnière, Lucien, 1448, 1453, 1458, 1474 Tessendorf, Sedinha, 1202 Thiersch, Craig L., 370, 1502 Thomas, Alun R., 1288 Thuilier, Juliette, 265 Tian, Ye, 1182 Toivonen, Ida, 1506, 1526 Tomasello, Michael, 126, 1157, 1521 Tooley, Kristen M., 1087 Torr, John, 67, 1106, 1257, 1258, 1503 Torrego, Esther, 191 Torreira, Francisco, 1157 Toutanova, Kristina, 66, 1111, 1117, 1118, 1129 Traugott, Elizabeth, 1450, 1458

Traum, David, 1161, 1182 Travis, Lisa deMena, 1290, 1533 Trawiński, Beata, 778 Traxler, Matthew J., 679, 1087, 1094 Tribout, Delphine, 955 Triesch, Jochen, 1231 Troseth, Erika, 303 Trousdale, Graeme, 1450, 1458 Trueswell, John C., 1086, 1087 Truswell, Robert, 690, 691 Tseng, Jesse, 177, 290 Tsiamtsiouris, Jim, 687 Tsujii, Jun'ichi, 65, 1083, 1119, 1511 Tsuruoka, Yoshimasa, 1119 Tuite, Kevin, 1205 Tuller, Laurice A., 564 Tyler, Lorraine, 1086 Uhmann, Susanne, 1054 Ullman, Jeffrey D., 1107 Uszkoreit, Hans, 56, 62, 65, 372– 374, 377, 380, 572, 680, 1283, 1503, 1522 Vaillette, Nathan, 560, 561, 617, 1297 Valentín, Oriol, 1365 Vallduví, Enric, 1044, 1046, 1047, 1049, 1051, 1052, 1058, 1063, 1064, 1071, 1161, 1162, 1220, 1223, 1282 van Benthem, Johan, 1336 van der Beek, Leonoor, 1123 van der Sluis, Ielka, 1217 Van Eynde, Frank, 31, 70, 223, 278, 287, 288, 292, 294, 295, 300, 301, 306–308, 381, 518, 522, 599, 678, 679, 697, 923, 958, 1161, 1255, 1258, 1515 van Genabith, Josef, 1428

Van Koppen, Marjo, 1288 van Noord, Gertjan, 65, 67, 158, 377, 378, 384, 386, 424, 450, 526, 922, 1118, 1119, 1123, 1283, 1502 van Riemsdijk, Henk, 257, 559, 568, 647, 1269 van Schijndel, Marten, 676 van Trijp, Remi, 126, 1511, 1526, 1527 Van Valin, Jr., Robert D., 322, 327, 328, 346, 493, 688, 690, 890, 1521 Varaschin, Giuseppe, 894, 924 Vasishth, Shravan, 827, 1095 Veenstra, Mettina Jolanda Arnoldina, 1502 Velldal, Erik, 1118, 1125 Vennemann, Theo, 926 Verbeke, Saartje, 190 Vergnaud, Jean Roger, 615 Versante, Laura, 1209 Verspoor, Cornelia Maria, 1524 Vicente, Luis, 671, 853 Vijay-Shanker, Krishnamurti, 51 Vikner, Sten, 812 Villata, Sandra, 675 Villavicencio, Aline, 746–749, 790 Villoing, Florence, 955 Vivès, Robert, 673 Vladár, Zsuzsa, 1453, 1454 von der Malsburg, Christoph, 1231 von Stechow, Arnim, 1052, 1054 Wagner, Petra, 1201 Wahlster, Wolfgang, 62, 1116, 1230 Wald, Benji, 227 Waldron, Benjamin, 1118 Wali, Kashi, 599

Walker, Heike, 611, 638, 640, 893, 910, 911, 914, 915 Walther, Markus, 9, 93, 1518 Wang, Rui, 66 Wanner, Eric, 1094, 1096 Warner, Anthony, 813, 818–820 Wasow, Thomas, 3–5, 23, 31, 34, 56, 57, 64, 65, 70, 141, 283, 371, 381, 495, 596, 681, 687, 701, 780, 782, 783, 785, 1081, 1084, 1087, 1105, 1107, 1161, 1182, 1253, 1255, 1263, 1264, 1280, 1299, 1303, 1352, 1356, 1366, 1374, 1375, 1450, 1503, 1506, 1508, 1510, 1511, 1515, 1516, 1521, 1534 Wax, David, 1122 Webber, Bonnie, 858 Webelhuth, Gert, 144, 148, 149, 156, 424, 442, 580, 581, 601, 602, 616, 830, 1044, 1052, 1067, 1070 Wechsler, Stephen, 10, 14, 16, 20, 53, 58, 61, 138, 139, 181, 185, 190, 194, 220–222, 225, 226, 229–235, 238, 281, 316, 323, 325, 327, 328, 331, 351, 355, 380, 383, 424, 469, 475, 490, 502, 506, 510–512, 540, 615, 745, 749, 821, 890, 894–896, 906, 915, 920, 922, 923, 954, 1002, 1003, 1012, 1013, 1019, 1091, 1121, 1160, 1161, 1168, 1254, 1263, 1266, 1293, 1332, 1352, 1354, 1396, 1397, 1409, 1411–1417, 1483, 1484, 1498, 1510, 1512, 1521–1524, 1526 Weinberg, Amy S., 672

Weinreich, Uriel, 779 Weir, David J., 51, 1107 Weissmann, John, 1231 Wells, Rulon S., 390 Wesche, Birgit, 386 Westergaard, Marit, 1504 Wetta, Andrew C., 388, 390, 400 Wexler, Kenneth, 760, 926 White, Michael, 1375 Whitman, Neal, 763, 875, 1356 Wichmann, Søren, 189 Wilder, Chris, 1367 Wilkins, David, 1209 Wilkins, Wendy, 890 Williams, Edwin, 125, 516, 552, 762, 890, 1259 Willis, David, 1297 Wiltschko, Martina, 573 Wiltshire, Anne, 1202 Winckel, Elodie, 623, 1258 Winkler, Susanne, 402, 894, 1045 Winter, Yoad, 669, 1223 Wintner, Shuly, 66, 1106 Witten, Edward, 679 Wittenburg, Peter, 1229 Wittgenstein, Ludwig, 479 Wright, Abby, 646, 912 Wu, Ying Choon, 1206 Wurmbrand, Susanne, 1259 Xia, Fei, 1122 Xue, Ping, 908 Yang, Chunlei, 177 Yang, Jaehyung, 65, 177, 628 Yankama, Beracah, 1257, 1505 Yatabe, Shûichi, 263, 391, 729, 754– 757, 761, 874–877, 1355, 1357, 1361–1363

Yatsushiro, Kazuko, 1296 Yi, Eunkyung, 1092 Yip, Moira, 254 Yngve, Victor H., 728, 729 Yoo, Eun-Jung, 97, 98, 252, 424, 450, 461, 463, 646, 647, 1007– 1009 Yoon, Jae-Hak, 643 Yoshida, Masaya, 854, 1366 Ytrestøl, Gisle, 1124 Yu, Jiye, 1125 Zaenen, Annie, 247, 254, 255, 495, 511, 761, 1359, 1503 Zalila, Ines, 620 Zamaraeva, Olga, 1122, 1126 Zanuttini, Raffaella, 811, 812, 823 Zec, Draga, 513 Zeevat, Henk, 1025, 1357 Zeijlstra, Hedde, 812, 814 Zhang, Niina Ning, 671 Zhang, Yi, 1117, 1118 Zimmermann, Thomas Ede, 1170 Zlatić, Larisa, 221, 225, 226, 229–231, 233, 745, 749, 1411–1413 Zwart, Jan-Wouter, 727 Zwarts, Frans, 665 Zwarts, Joost, 1223 Zwicky, Arnold M., 256, 303, 726, 727, 1120, 1449 Żywiczyński, Przemysław, 1208

Abelam, 731 Algonquian, 264 American Sign Language, 1215, 1216 Arabic, 195<sup>27</sup> , 264, 502<sup>9</sup> , 543, 557<sup>18</sup> , 558<sup>19</sup> , 566, 616, 618–620, 633<sup>47</sup> , 646, 740, 854, 1402, 1477, 1606 Iraqi, 570–571 Libyan, 826 Syrian, 197<sup>29</sup> Archi, 190<sup>21</sup> , 547, 1288<sup>29</sup> Austronesian, 178, 189, 195<sup>27</sup> , 264, 323, 325, 918 Western, 918 Bahasa Indonesia, 854 Balinese, 139, 323–325, 346, 510–513, 920–922 Baltic, 954<sup>3</sup> Bambara, 51, 1397 Bantu, 748, 1413, 1504 Basque, 178, 184, 185, 831, 1504 Batsbi, 963, 967 Bavarian, 619<sup>31</sup> , 1230 Bulgarian, 66, 616<sup>29</sup> , 854 Burushaski, 178 Catalan, 432, 446, 795, 1063, 1064 Celtic, 178, 748 Chi-Mwi:ni, 917 Chicheŵa, 1413 Chinese, 50, 1095

Chintang, 981 Choctaw, 987 Classical Armenian, 254 Coast Tsimshian, 178, 199, 200 Coptic, 549, 570 Czech, 504 Danish, 66, 177, 182<sup>9</sup> , 381, 384<sup>8</sup> , 386, 646, 673<sup>7</sup> , 915<sup>27</sup> , 1046, 1059, 1115 Dogon, 642<sup>60</sup> Dutch, 51, 65, 278, 287–290, 294, 307, 378, 384, 386, 424, 450, 499<sup>8</sup> , 526<sup>26</sup> , 780, 783, 797, 798, 1118, 1123, 1293, 1352<sup>16</sup> , 1356, 1608 Edo, 423 Egyptian, 570, 677, 684 Coptic, 571 English, 5, 7, 13–15, 28, 33, 52, 54, 58, 61–66, 89, 92, 98, 113, 115, 116, 118, 120, 121, 127, 130, 145, 147, 151, 152, 154–157, 162, 177, 178, 180, 183<sup>10</sup> , 195, 196, 198, 220, 227, 231, 252, 256–258, 261, 278, 286, 288, 290, 293, 299<sup>20</sup> , 304, 306, 315, 322–324, 330, 331, 333, 336, 338, 340, 342, 344, 348, 349, 351, 352, 369, 370, 374, 375, 377, 379–381, 383, 384,

386, 394, 432, 439, 460<sup>26</sup> , 496, 502, 506, 514, 515, 522, 526, 538, 550<sup>12</sup> , 552, 557, 568–571, 575, 596–599, 601, 603, 605, 609, 610<sup>20</sup> , 616, 617, 619–621, 626, 628, 629, 632, 634, 635, 641, 643, 644, 646, 681, 682, 684<sup>53</sup> , 694, 734–740, 742, 745, 746, 750, 778<sup>4</sup> , 779, 782, 783, 793–795, 797, 798, 812–818, 820<sup>10</sup> , 823, 824, 827, 854, 855, 857<sup>11</sup> , 858<sup>12</sup> , 869<sup>23</sup> , 871, 892, 894, 898, 904, 908, 911, 912, 915, 917, 918, 920, 924, 947, 952, 953, 956, 1004, 1006, 1016, 1020–1023, 1049–1051, 1070, 1071, 1073, 1082, 1083, 1094–1096, 1109<sup>9</sup> , 1110, 1115, 1116, 1118, 1119, 1122–1126, 1129, 1159, 1161, 1163, 1165, 1180, 1216, 1230, 1255, 1261, 1262, 1265, 1266<sup>14</sup> , 1269, 1275, 1277, 1287, 1290, 1292– 1294, 1300, 1335, 1342, 1352, 1356, 1357, 1369, 1402, 1404– 1408, 1410, 1411, 1416<sup>15</sup> , 1423, 1425, 1428<sup>23</sup> , 1471, 1472, 1477, 1482, 1483, 1503, 1518, 1522, 1526, 1533, 1605, 1606 African American Vernacular, 516, 545<sup>4</sup> , 1503 Estonian, 985, 986 Balto-, 915<sup>27</sup>

Finnish, 254, 255, 265, 827 Finno-Ugric, 263, 986 Flemish, 238

French, 136, 159, 160, 163, 177, 181, 183<sup>10</sup> , 195, 232, 234, 235, 252, 278, 290, 292, 304, 307, 321, 344, 348, 351, 352, 377, 386<sup>9</sup> , 391, 420, 421, 424– 426, 429, 436<sup>11</sup> , 446–449, 469, 493, 506, 507, 514<sup>19</sup> , 516, 526<sup>26</sup> , 545, 550<sup>12</sup> , 559, 563<sup>24</sup> , 568, 570, 603<sup>10</sup> , 617, 619, 622–626, 631<sup>44</sup> , 634<sup>50</sup> , 640, 641, 682, 697–699, 736, 738–740, 748, 751, 752, 758, 759, 783, 797, 811–818, 820<sup>10</sup> , 822–824, 837, 838, 854, 855, 858<sup>12</sup> , 859<sup>13</sup> , 871, 875, 954, 957, 1115, 1159, 1186, 1230, 1233, 1356, 1368, 1476, 1606

Georgian, 66, 333

German, 33, 62–66, 144, 148–150, 157, 177, 195, 200<sup>30</sup> , 246, 249–251, 253, 256, 257, 263, 265, 278, 286, 287<sup>11</sup> , 318, 345, 349, 351, 369, 370, 372, 373, 377–384, 386, 388, 390, 391, 394, 395, 397–402, 404, 420, 424, 441–443, 450, 451, 456–458, 460, 461, 463, 466<sup>30</sup> , 467<sup>31</sup> , 479, 491–493, 497–499, 515, 526<sup>26</sup> , 550<sup>12</sup> , 558–560, 567–569, 572, 575, 603<sup>10</sup> , 605<sup>14</sup> , 616<sup>29</sup> , 619<sup>31</sup> , 634, 640, 644–646, 682, 778<sup>4</sup> , 779<sup>6</sup> , 780, 783, 785, 792, 794, 795, 797, 798, 813<sup>5</sup> , 851, 853<sup>5</sup> , 892<sup>7</sup> , 904, 905, 912, 915, 918, 924, 925, 950, 952, 955, 956, 959, 965, 969, 970, 974, 1049, 1066, 1068–

Finnic

Itialian, 982

1070, 1085, 1115, 1116, 1119, 1156, 1159, 1203<sup>1</sup> , 1205, 1206, 1211<sup>5</sup> , 1217, 1230, 1256, 1260, 1269, 1286, 1288, 1290, 1292, 1293, 1296, 1404, 1405, 1407, 1409, 1455, 1459, 1460, 1471, 1503, 1506, 1511, 1516, 1522– 1524, 1526, 1528, 1533 Swiss, 51, 1397 Germanic, 64, 196, 235, 278, 383, 384, 386, 460<sup>26</sup> , 857, 1120 Gothic, 254 Greek, 66, 235, 288, 502<sup>9</sup> , 795, 797, 798, 854, 1064 Gujarati, 190 Gungbe, 1504 Halkomelem, 333 Hausa, 66, 543, 545, 546, 561, 563– 565, 616<sup>29</sup> head-final languages, 468 Hebrew, 66, 561, 562, 795, 798 Hindi, 235, 236, 423, 424, 599<sup>1</sup> , 639<sup>55</sup> , 739<sup>7</sup> , 827, 1115 Hindi-Urdu, 190 Hungarian, 265, 348, 698, 798, 851, 853, 868, 1261, 1262 Icelandic, 235, 247, 250, 252, 255<sup>7</sup> , 265, 495, 1293, 1405 Indonesian, 66, 195<sup>27</sup> Irish, 550, 566, 619<sup>31</sup> Iroquoian, 178, 187, 354, 988, 1306 Italian, 159, 252, 278, 281, 286, 288– 290, 292, 295, 348, 422–424, 429, 431–436, 438, 439, 441, 443, 444, 446–449, 479, 509, 812, 813, 832–835, 837, 871, 982, 984

Japanese, 28, 63, 65, 177, 321, 322, 379, 422, 499<sup>8</sup> , 513, 570, 626– 628, 632<sup>46</sup> , 641<sup>59</sup> , 642<sup>61</sup> , 673<sup>7</sup> , 682, 692, 729, 734, 735, 825, 826, 1059, 1060, 1095, 1116, 1123<sup>30</sup> , 1286, 1295, 1296, 1352<sup>16</sup> , 1356, 1605 Javanese, 195<sup>27</sup> Kanuri, 731 Kimaragang, 178, 198, 201 Kinyarwanda, 333 Korean, 65, 177, 252, 264, 265, 420, 424, 450, 461, 463, 465– 469, 479, 499<sup>8</sup> , 626–628, 631, 641, 642, 673<sup>7</sup> , 692, 811–813, 827–831, 852, 853, 864<sup>18</sup> , 868<sup>22</sup> , 1605 Kurdish Sorani, 983 Lakhota, 642<sup>60</sup> Latin, 1454 Lezgian, 178, 190<sup>21</sup> , 191 Linkhood Constraint, 1065 Lummi, 1082, 1083 Macushi, 178, 180, 181, 183 Malagasy, 201 Malayalam, 692 Maltese, 66, 178, 1115 Mandarin Chinese, 66, 177, 864<sup>18</sup> , 908, 912, 918, 1115, 1230, 1260 Marathi, 190, 599<sup>1</sup> , 639<sup>55</sup> Mari, 981 Mauritian, 506–509, 513, 521, 545, 546, 854, 863

Moro, 325, 333 Ndebele, 749 Nepali, 979, 989 Ngan'gityemerri, 228 Nias, 264, 265, 546<sup>8</sup> , 547 Norwegian, 63, 65, 66, 71, 177, 673<sup>7</sup> , 798, 908 Nyanja, 979 Oneida, 178, 186, 187, 202<sup>31</sup> , 354, 355, 988, 989, 1306, 1483 Pashto, 132, 133 Passamaquoddy, 236–238 Persian, 66, 177, 420, 424, 467<sup>31</sup> , 469, 472, 479, 480, 499<sup>8</sup> , 513, 526<sup>26</sup> , 561, 566, 646, 798, 1115 Polish, 6, 7, 65, 252, 256, 259, 260, 263, 265, 504, 506, 616<sup>29</sup> , 698, 826<sup>12</sup> , 835–838, 849, 850, 854, 863, 1032, 1033 Portuguese, 65, 133, 177, 431, 436<sup>11</sup> , 446, 746, 748, 854, 908, 918 European, 980<sup>14</sup> Quechua, 642<sup>60</sup> Romance, 16, 64, 188, 235, 278, 344, 351, 379, 420, 421, 424, 428, 430, 431, 433, 435, 436, 439– 442, 444, 446, 449, 456, 458<sup>25</sup> , 460, 461, 467<sup>31</sup> , 468, 479, 480, 493, 499<sup>8</sup> , 673<sup>7</sup> , 736, 746, 748, 750, 798, 857, 1591 Romanian, 436<sup>11</sup> , 443<sup>14</sup> , 447, 479, 513, 634<sup>50</sup> , 737, 871, 1033

Russian, 259, 349, 350, 504, 569, 698, 854, 917 Salish, 1082 Scandinavian, 64 Scottish, 63 Semitic, 178, 516, 748 Serbo-Croatian, 221–223, 225, 229, 230, 569, 745, 854, 1411, 1412 Slavic, 235, 504, 516, 569, 570, 798, 954<sup>3</sup> Balto, 915<sup>27</sup> Spanish, 65, 177, 195, 229, 320, 431, 434, 436, 438–441, 443, 444, 446–449, 479, 682, 738, 751, 833, 854, 1020, 1071, 1072, 1115, 1123<sup>30</sup> Sundanese, 195<sup>27</sup> Swahili, 163, 961–963, 967, 972, 976, 979–981, 987 Swedish, 233, 673<sup>7</sup> , 1186 Tagalog, 195, 264, 325, 1425, 1426 Telugu, 731 Thai, 66, 1284, 1374 Toba Batak, 140, 325, 918–920 Tongan, 178, 197, 199 Tsez, 236 Turkish, 131, 132, 134, 424, 616<sup>29</sup> , 811, 812, 825, 826, 954<sup>3</sup> Udi, 132, 133, 547 Wambaya, 66, 402, 1472 Warlpiri, 33, 178, 370, 371, 394, 397, 402–403, 1402, 1421, 1423 Welsh, 7 6 , 28, 178, 196, 197, 238, 524, 539, 547, 561–563, 566, 645<sup>65</sup> , 646, 749, 832, 834,

835, 838, 1287, 1288, 1297, 1469, 1470, 1474

Yiddish, 66, 1115 Yimas, 986 Yucatec Maya, 178

 , 17<sup>21</sup> , 159, 182<sup>8</sup> , 353, 391, 969, *see* relation, shuffle ∪, 24<sup>32</sup> , 969 ↦→, 13<sup>16</sup> , 155, *see* lexical rule ⊕, 15, 375<sup>2</sup> , *see* relation, append /, 151, 152, 550, 1016 ], 611<sup>23</sup> , 969, 1017, 1065<sup>15</sup> +, 1166, 1532 /H, 1114<sup>16</sup> /QEQ, 1114<sup>16</sup> ⊕, 499<sup>8</sup> ⇒, 13<sup>16</sup> , *see* constraint, implicational AB Grammar, 1332, 1333, 1337 ABC Grammar, 1337 abstract gesture, 1211 accent, 1049–1051, 1070–1073, 1223 A-accent, 1049, 1162 B-accent, 1049, 1162 accept, 1182 acceptability, 1089 acceptance, 1165, 1166 accessibility hierarchy, 322, 1094<sup>11</sup> ACE, 1117 acknowledgment, 1165 ACQUILEX, 62 acquisition, 1158, 1301–1303, 1503– 1505, 1521–1522 across-the-board extraction, 551, 552, 561, 562 adaptor gesture, 1208 addressee, 1160

adjacency pair, 1181 adjective, 555–557, 1129 adposition postposition, 28 preposition, 28 adverb, 822, 1125 as complement, 255, 352, 824 affiliate, 1212, 1214 affiliation, 1204–1206 affix, 1276 affixation, 131, 157 AGGREGATION, 1121–1122 Agree, 5, 1117, 1265–1266, 1303 agreement, 61, 390, 889 closest conjunct, 748 object, 963, 987, 1504 Agreement Marking Principle (AMP), 232 alignment, 189 Alpino, 1112<sup>14</sup> , 1118–1119, 1123 Alpino Treebank, 67 alternation, 324, 330, 335, 338–351 diathesis, 332 voice, *see* passive ambiguity, 1086, 1091, 1110, 1112, 1117, 1123, 1124, 1128–1129 lexical, 1129 modifier attachment, 1110, 1112 part-of-speech, 1110 spurious, 1124 syntactic, 1129

American structuralism, 1127 AMI Meeting Corpus, 1230 analysis order effect of, 1129–1130 anaphor, 322, 324, 1285 null, 326, *see* obviation annotation, 1112, 1123, 1124 automatic, 1124 anthology search, 66 anti-passive, 798 Anvil, 1229 aphasia, 1232 append, 102 application, 1115–1119, 1124–1127 e-commerce customer email response, 66 education, 1124–1125 grammar correction, 66 Arc Pair Grammar, 1273<sup>19</sup> argument term, 322 Argument Cluster Coordination, 756, 1339 argument indexing, 180 argument label, 1114 argument realization, 316, 319–326, *see* linking phrasal approaches to, 355–357 Argument Realization Principle, *see* principle argument structure, 1121, 1415<sup>13</sup> , 1523 art (software), 1117 artificial dataset, 1128 aspect, 798 attitude reports, 1172 attributive use, 1175 augmented/adaptive communication, 63

Austinian proposition, 1175 auxiliary verb, 1119 Babel, 1119 background condition, 1163 background gesture, 1208 background information, 1159–1161 Backward Slash Elimination, 1344 Backward Slash Introduction, 1344 Bare Argument Ellipsis, 848 basic integration scheme, 1213 basic type, 1173 baton, 1209 beat gesture, 1209 behavioral properties, 190<sup>20</sup> belief objects, 1172 bi-directionality, 1107 Big Mess Construction, 1515 bimodal integration, 1214 binary-branching, 1119 binding, *see* anaphor Binding Theory, 61, 138–140, 224, 379<sup>5</sup> , 896, 1285–1374 biomedical NLP, 1125, 1126 BioNLP 2009 Shared Task, 1125 bound variable, 1114 bound word, 778 branching, 1276 binary, 5, 1271 non-binary, 524 unary, 1284<sup>27</sup> bridging, 1157 British lexicology, 1128 broad-coverage grammar, 1118, 1128 BV, 1114 c-command, 199, 891, 1265 canonical sign, 321 canonical *synsem*, 160

Cartography, 1280 case, 245–265 case frame, 1122 Categorial Grammar (CG), 141, 259, 371, 384, 599<sup>3</sup> , 1001, 1019, 1331, 1501, 1521, 1525, 1530, 1533 Combinatorial ~ (CCG), 1108, 1112, 1120<sup>25</sup> , 1332, 1337– 1343 Hybrid Type-Logical ~ (Hybrid TLCG), 1348 Linear ~ (LCG), 1356 Categorial Type Logics, 1344 categorisation game, 1217 category Conj, 1288<sup>30</sup> functional Num, 1276 T, 1266, 1276 causative, 798 Celex, 1118 character viewpoint, 1227 Chomsky hierarchy, 1305 chronemics, 1201 chunking, 1123 circumstantial features, 1159–1161, 1175 CKY algorithm, 51 clarification potential, 1167–1169, 1171, 1176 clarification request, 1167–1169, 1172 classifications, 1173 clause type, 1282 clause union, 352, 353 CLIMB, 1117, 1130 clitic, 16, 132, 133, 136, 751, 798 endoclitic, 132–134

in Romance languages, 159, 160 clitic left dislocation, 1064 cliticization, 51 co-verbal gesture, 1202–1208 coding properties, 190<sup>20</sup> Cognitive Grammar, 68, 1510 coherence, 968 coherence field, 456<sup>20</sup> collocation, 781 comment, 1161 commitment-stores, 1166 common ground, 1164, 1166 common knowledge, 1166 communicative interactions, 1175 community membership, 1164, 1165 comparative correlative, 553, 737– 740, 796, 1257, 1499–1500 competence, 781, 1081, 1157, 1158, 1165, 1177 competence hypothesis, 1081, 1083, 1084, 1088 complement, 1109 completeness, 968 complex antecedent, 1115 Complex NP Constraint, 572, 577 complex predicate, 373, 1450<sup>1</sup> complexity average-case, 1108, 1109 computational, 1107–1110 of grammars, 1106 time, 1108 worst-case, 1108, 1109, 1119 compositionality, 1127 comprehension, 1085–1088, 1090, 1094<sup>11</sup> , 1263 computational linguistics, 1256 computational tractability, 1084 concatenation, 51

conceptual literacy, 1156 conceptual orality, 1156 conceptual vector meaning, 1224 conduit metaphor, 1210 conjunct constraint, 1093 conominal, 180 Consistency, 1401<sup>7</sup> constituent boundary, 1123 discontinuous, 1522 ordering, 61 prosodic, 1162 Constituent Order Principle, 51 constraint, 8 implicational, 13, 96–97, 145, 146 constructicon, 1506 construction, 62, 1092, 1499, 1500, *see* schema Construction Grammar (CxG), 68, 141, 165, 355, 784, 788, 1106, 1298, 1353, 1497–1536 Berkeley, 1510–1511 Cognitive, 1510 Embodied, 1510, 1511 Fluid, 1510, 1511, 1527 Radical, 1510 Sign-Based, 265, 387, 524, 596, 610<sup>20</sup> , 793, 910<sup>23</sup> , 1353, 1510– 1511 Construction Morphology, 1527 construction-like object, 1510 Constructional HPSG, 62, 788–790 context, 1087, 1090, 1164 context dependence, 1157 context-free backbone, 1108 context-free grammar (CFG), 1109, 1123 contextual anchors, 1160

contingent gesture, 1207 contraction, 522 control, 61, 336, 1109, 1119, 1129 conventional interpretations, 1166 conversational analysis, 1181 conversational rule, 1177–1181 Cooper storage, 569 coordinand, 725 Coordinate Structure Constraint, 744, 1093 coordination, 49, 65, 256, 259–264, 391, 896, 1119 nonconstituent, 1337 coordinator, 725 core grammar, 783, 1106, 1115, 1116, 1121, 1129, 1256, 1501 CoreGram, 130, 1115–1116, 1120–1122, 1256 corpus search structure-based, 1123 cosupposition, 1227 covariational conditional, *see* comparative correlative coverage, 1116, 1118, 1123, 1126–1128 cranberry word, *see* bound word cross-index, 184 cross-linguistic comparison, 1115 cross-serial dependencies, 1109<sup>9</sup> , 1118 Curry-Howard Isomorphism, 1347 cyclic feature description, 392, 1115 D-structure, 1502 database query, 66 decomposability, 780 deep processing, 1116 default, 788–790, 1016 deferred ostension, 1219

deferred reference, 1157, 1219

deictic gesture, 1209, 1216–1221 deictic word, 1217 delivery gesture, 1228 DELPH-IN Consortium, 1109<sup>9</sup> , 1109, 1113, 1116–1118, 1121 DELPH-IN MRS Dependencies (DM), 1113, 1114, 1126, 1127 demonstration act, 1209 demonstratum, 1157 dependency, 202, 1522, 1524 semantic, 1112–1114, 1119 syntactic, 1112<sup>14</sup> , 1118 unbounded, 538 Dependency Grammar (DG), 1447– 1486, 1521, 1530, 1533 dependency graph, 1112, 1123, 1126 Dependency Minimal Recursion Semantics (DMRS), 1112, 1114, 1127 dependency parsing, 1119 semantic, 1127 dependent type, 1173, 1357 Dependent Type Theory, 1357 dependent-marking, 189<sup>18</sup> derivation, *see* morphology Derivational Theory of Complexity, 1081<sup>1</sup> derived word, 156 diachrony, 227 dialog history, 1175 dialogue agent, 1166 dialogue gameboard, 862<sup>16</sup> , 1166, 1175–1181 parameter, 1175 dialogue progress, 1166 diathesis alternation locative alternation, 147 voice alternation, 139

direct reference, 1160 directed acyclic graph, 1119 directionality, 1088 disambiguation, 66, 1123 discontinuous constituent, 51, 390– 397, 402–403, 1119 Discourse Representation Theory, 1025, 1161 discriminant, 1112, 1123, 1124 disfluencies, 1084, 1087, 1157 disjoint union, 1065 disjunction distributed, 959–960 Displacement Calculus, 1344 Distributed Morphology, 128, 130, 355 distributional hypothesis, 1127 distributional semantics, 1127 ditransitive, 323, 325, 330, 333, 336, 342 documentation of grammars, 1107, 1117 domain, 1169, *see* linearization domain downdate, 1166, 1182 dual point, 1219 DUEL Corpus, 1230 dynamic semantics, 1161 Education Program for Gifted Youth, 1125 effects, conversational rule, 1177 ELAN, 1229 element constraint, 1093 Elementary Dependency Structures (EDS), 1112, 1127 elementary predication, 339 ellipsis, 391, 522, 635, 636 word internal, 131

Eloquent Software, 54 Elsewhere Condition, *see* Pāṇini's Principle emblem, 1209 emblematic gesture, 1209 emotions, 1201 empty element, 7 6 , 912, 1115, 1267, 1273, 1277, 1284<sup>27</sup> , 1293, 1304 empty operator, 1277, 1296 en bloc insertion, 782 endangered language, 1121 English Resource Grammar (ERG), 62, 65, 1109<sup>9</sup> , 1110, 1113, 1116, 1125, 1126, 1129 Enju, 1119, 1125, 1127 entailments lexical, 328, 332, 336 EQ, 1114 equality modulo quantifiers, 1113 ergative, 189, 193 ergativity, 323–325 evaluation automatic, 1125 Exceptional Case Marking, 516 exemplification, 1222 EXMARaLDA, 1229 experience-based HPSG, 790 experiencer, 1163 experiencer predicates, 327 experiment, 1258 experimental semiotics, 1228 expletive, *see* pronoun extension condition, 1266 extra-grammatical mark-up, 1117 extraction, 158, 1119 subject, 57 extraction path effect, 548, 550

extraction pathway marking, 1359 extraposition, 57, 395–397, 451, 572– 579, 637–639, 677–680, 898, 905, 1531 eye-tracking, 1085, 1086, 1096 f-description, 1399<sup>3</sup> face to face interaction, 1202 facial expressions, 1201 facts, 1165 feature, 8 -FEAT, x 1ST-PC, vii A-INDEX, 349 ACCENT, vii, 1071 ACT(OR), vii, 328 ADDRESSEE, vii, 601, 606 AFF, vii, 1223 AGR, vii ANAPH, vii ANCS, vii ANTEC, vii ARG-ST, vii, 137–140, 146, 149, 153, 154, 159–162, 165, 319– 321, 325, 604<sup>30</sup> , 608, 609<sup>30</sup> , 627, 1012, *see* linking as locus of binding, 318 extended, *see* resultative, clause union ARG1, 1114 ARG2, 1114 ARG, vii AUX, viii BACKGROUND, viii, 1159 BACKGR, viii BD, viii BG, viii BODY, viii BOH, 1215

C-INDICES, viii C-INDS, viii, 1159 CARRIER, 1223 CASE, viii CATEGORY, viii CAT, 597, 600, 608, 619, 623 CLITIC, viii CLTS, viii CLUSTER, viii CL, viii COLL, viii COMPS, viii, 321, 619, 623, 625, 632 CONCORD, viii CONDS, viii CONTENT, viii, 600, 608, 611–614, 618, 623, 625, 633, 1159 CONTEXTUAL-INDICES, 1159 CONTEXT, viii, 1051, 1158, 1159, 1163 CONT, viii COORD, viii CORREL, viii CTXT, viii CVM, 1224 DEPS, viii, 1109<sup>30</sup> DET, viii DGB-PARAMS, 1175 DGB, 1166 DIR, 1215 DISCOURSE-REFERENT, 1029 DOMAIN, 391 DOM, viii DR, viii DSL, viii, 196 DTE, viii DTRS, viii ECONT, viii

EMBED, viii ENDING, viii EXCONT, viii EXC, viii EXPERIENCER, 1163 EXP, viii EXTENT, 1215 EXTRA, viii FACTS, 1165 FCOMPL, viii FC, viii FIG, viii FIRST, viii, 102, 1109<sup>30</sup> FOCUS, viii, 1161 FORM, ix FPP, ix FRAMES, 1520 G-DTR, 1220 GEND, ix GIVEN, ix GRND, ix GROUND, ix, 1161 GTOP, ix HANDSHAPE, 1215 HARG, ix HCONS, ix, 1047 HD-DTR, ix HD, ix HEAD-DAUGHTER, 612, 618, 628, 630 HEAD, ix, 597, 600, 608, 611–613, 618, 619, 623, 628, 630, 632, 633 HOOK, ix I-FORM, ix, 137 ICONS, ix, 1047, 1113 ICONT, ix IC, ix

IN(PUT), 345 INCONT, ix INC, ix INDEPENDENT-CLAUSE, 1279 INDEX, ix, 600, 606, 608, 613, 614, 619, 625, 633 IND, ix INFL-STR, 355 INFL, ix INFO-STRUC, ix INHER, ix INSTANCE, 606 INST, ix INV, ix, 611 IP, ix KEY, ix, 329–331, 339, 342 L-PERIPH, x LAGR, ix LARG, ix LATEST-MOVE, 1165 LBL, ix LEX-DTR, ix, 345, 346 LEXEME, ix LF, ix LID, ix, 1508 LIGHT, ix LINK, ix, 1162 LISTEME, ix LISZT, ix LOCAL, ix, 597, 600, 608, 612, 619, 623 LTOP, x MAIN, x MAJOR, x MARKING, x MAX-qUD, x MC, x, 611, 628 MINOR, x

MKG, x MODAL-BASE, x MODE, 1223 MOD, x, 597, 600, 608, 611, 612, 614, 619, 623, 628, 630–633 MOOD, x MORPH-B, x, 157 MORPH, x MPH, x MP, x MRKG, x MS, x, 1166 MUD, x NEG, x NH-DTRS, x NON-HEAD-DAUGHTERS, 611, 612, 618, 628 NON-HEAD-DTRS, x NONLOCAL, x, 600, 619, 623 NUCLEUS, 601, 606 NUCL, x NUMB, x N, x ORIENT, 1215 OTHER COMPLEMENTS, 138 PALM, 1215 PARAMS, x PARTS, x PATH, 1215, 1216, 1224 PA, x PC, x PERS, x PFORM, x PHON-STRING, x PHON, x, 1071 PHP, x PH, x POL, xi

POOL, xi POSITION, 1215 PRD, xi PRE-MODIFIER, xi PRED, xi PREF, xi PRIMARY OBJ, 138 PROP, xi, 606, 608, 613, 614 Q-PARAMS, 1175 Q-STORE, xi QSTORE, xi qUANTS, xi qUD, xi, 1165 qUES, xi R-MARK, xi RAGR, xi REALIZED, xi REL(ATION)S, 19 RELS, xi, 331, 339, 1520 REL, xi, 608, 611, 612, 618 RESTRICTIONS, xi RESTR, xi, 606, 608, 613, 614, 619, 633 REST, xi, 102, 1109<sup>30</sup> RETRIEVED, xi RLN, xi ROOT, xi RR, xi RSTR, 1114 S-DTR, 1220, 1224 SAL-UTT, xi SECOND OBJ, 138 SELECT, xi SEL, xi SIT-CORE, xi, 337 SIT, xi SLASH, xi, 608, 612, 618, 619, 625, 628, 630–632, 1095, 1096,

1109<sup>30</sup> SOA-ARG, xi SOA, xi, 606, 608 SOA (State of Affairs), 330 SPEAKER, xi, 601, 606 SPECIFIED, 1518<sup>30</sup> SPEC, xi SPR, xi STANDARD, 1163 STATUS, xi STEM, xi STM-PC, xi STORE, xi STRUC-MEANING, xi SUBCAT, xii, 317 SUBJ-AGR, xii SUBJ, xii, 321, 618 SYNC, 1215 SYNSEM, xii, 597, 600, 608 TAIL, xii, 1162 TAM, xii THETA-ROLE, 785, 787, 788 TNS, xii TO-BIND, 609, 620<sup>30</sup> TOPIC, xii TP, xii TRAJ(ECTORY), 1223 UND(ERGOER), 328 UNDER(GOER), 328 UND, xii UT, xii VAL, xii VAR, xii VEC, 1224 VFORM, xii, 618 V, xii WEIGHT, xii WH, xii

WRIST, 1215 XARG, xii ARG-ST extended, 351–353 feature cooccurrence restriction, 56 feature geometry, 136, 145 feature structure, 60 feature structure description, 60, 1107<sup>4</sup> feedback, 1177 feedback signal, 1177 FIGURE Corpus, 1230 File Change Semantics, 1161 filler, 55 fingerspelling alphabet, 1215 finite closure, 56 first-order logic, 1125 focus, 1161–1162, 1267, 1280, 1418, 1504 foreground gesture, 1208 formal language theory, 1107<sup>5</sup> formalism, 1117 differentiating from theory, 1106 formalization, 69, 1256 Forward Slash Elimination, 1344 Forward Slash Introduction, 1344 fractal architecture, 1161 fractal design, 1167 fractality, 1161, 1167 fragmentary utterance, 635 Frame Semantics, 1224, 1520–1521 Freezing, 680–681 function application, 1337 Function Composition, 1337, 1338 function type, 1174 functional application, 1501 functional equations, 1398

gap, 1341, 1342, 1347, 1349, 1350 dishonest, 580 parasitic, 61, 256, 258, 1342, 1359 gapping, 263, 756, 1364–1366 garden paths, 1087 gaze, 1201 gaze-pointing, 1209 gender, 221, 889 generalization, 1117 Generalized Phrase Structure Grammar (GPSG), 48–50, 52, 371, 372, 399, 782–784, 791, 1253, 1397, 1501, 1503<sup>6</sup> , 1522, 1527 generalized quantifier, 1172 generation, 1106, 1107, 1117, 1119, 1125, 1128 Generative Grammar, 780, 782, 1045 Generative Semantics, 126, 127, 1274 genetics, 1504 gerunds, 152 gesture, 1157, 1299<sup>38</sup> gesture affiliation, 1205, 1206, 1212 gesture classes, 1208 gesture classification, 1210 gesture meaning, 1212 gesture representation, 1214–1216 gesture space, 1215 gesture stroke, 1211 gesture typology, 1221 gesture-as-by-product, 1208 gesture-as-product, 1208 gesture-speech relations, 1206 GF Resource Grammar Library, 1122 GG (German Grammar), 1116 given information, 1161–1162, 1165 Glue Semantics, 1012 gold parse, 1111 gold standard, 1111

good continuation, 1203 Government and Binding (GB), 191<sup>23</sup> , 246, 275, 384, 893<sup>9</sup> , 1159, 1254, 1498, 1501, 1502 gramm-index, 186 grammar engineering, 1120<sup>25</sup> multilingual, 1115–1118 grammar exploration, 1106 Grammar Matrix, 66, 264, 1116, 1121– 1122 customization system, 1121, 1130 library, 1116, 1121 grammar tutoring, 66 grammar-dialogue interface, 1175 Grammatical Framework (GF), 1120<sup>25</sup> grammatical function, 895 grammaticalisation process, 1228 grammaticality, 1117 grammaticality judgment, 1116 Grammix, 66 ground, 1161 grounded reasoning, 1128 grounding, 1166, 1175, 1176, 1182 handshape, 1215 head, 9 9 , 21, 1109, 1271 functional, 1267, 1277 Head Feature Principle, 734 head information, 1119 head movement, 370, 383–386, 812, 1407 head-marking, 189<sup>18</sup> hedge, 1125 heterogeneous universe, 1170 hierarchical lexicon, 141–150 Hinoki Treebank, 1123<sup>30</sup> holistic effect of direct objecthood, 338

HP Labs, 1106 HP-NL, 65 HPSG-TTR, 1182–1186 hypothetical reasoning, 1345 *i*-within-*i*-Condition, 924–925 iconic gesture, 1208, 1221–1228 idiom, 777–801, 1501 non-decomposable, 783 idiomaticity, 777–781 lexical, 778–779 pragmatic, 781 semantic, 780–781 statistical, 781, 790 syntactic, 779–780 illustrator gesture, 1208 imagistic gesture, 1202 immediate dominance, 371 immediate dominance schema, 318 inchoative, 798 incomplete demonstratives, 1204 [incr tsdb()], 1116, 1117 incremental processing, 1020<sup>5</sup> , 1084– 1085, 1161, 1182, 1300, 1375 index, 1157 index finger pointing, 1209 index palm down, 1209 index palm vertical, 1209 indexical expressions, 1160 indirect reference, 1157, 1219 infinitival VP, 1129 inflection, *see* morphology information extraction, 1125 biomedical, 1126 information packaging, 1161–1162 information retrieval, 1125 information state, 1165, 1166 information state model, 1166

homogeneous universe, 1170

information structure, 10, 1121, 1161– 1300 inheritance, 142–148 default inheritance, 150–155, *see* YADU multiple inheritance, 148–150 insertion, 51 inside-out constraint, 788 intensional entities, 1172 interactive gesture, 1207, 1210, 1228– 1229 interface, 1282 Interface model, 1232 interjections, 1176–1177 interlinear glossed text (IGT), 1122 internal structure of words, 131 interrogative, 62, 1266 intrinsic argument, 1112, 1113<sup>15</sup> introspection, 1258 inversion, 522 island, 579, 855, 1305 island constraint, 61, 1092, 1093, 1420 Item-and-Arrangement, 957 Item-and-Process, 958

Jacy, 1116 judgement, 1173

Kahina, 1116 kairemics, 1202 KoS, 1175–1181

label (MRS), 1114 labeled deduction, 1333 labeling, 1268, 1303 Lambek calculus, 1343 language acquisition, *see* acquisition language documentation, 1120–1124 language system, 1157–1158

language teaching, 1124 language understanding, 1128 Lassy Treebanks, 1123 latest move, 1165 lattice, 161 laughter, 1157 Lexical Access model, 1232 lexical boost, 1092 lexical class, 1122 lexical decomposition, 327, 334–337 lexical entry, 134–138, 1107<sup>3</sup> , 1111, 1123 induction of, 1119 Lexical Functional Grammar (LFG), 10<sup>10</sup> , 53, 126, 128, 136, 178, 186, 254–255, 322, 908<sup>20</sup> , 1019, 1120<sup>25</sup> , 1128<sup>33</sup> , 1253, 1501 Lexical Integrity, 128, 129, 131–133, 839, 1352<sup>16</sup> lexical item, 1113 Lexical Mapping Theory, 1415 lexical representation, 1090 Lexical Resource Semantics (LRS), 13, 611<sup>22</sup> , 638–639, 1026– 1034, 1171, 1428 lexical rule, 61, 129, 155–165, 1107<sup>3</sup> , 1122, 1284<sup>27</sup> Argument Attraction, 435 Complement Extraction, 158 Complement Extraposition, 573 extraposition, 898 for body part nouns, 906–907 Medio-Passive, 432 Passive, 345, 346 Subject Extraction, 609<sup>18</sup> lexical sign, 137 lexicalism, 127–134, 136, 141, 1091–

1092, 1111, 1352<sup>16</sup> , 1415<sup>13</sup> , *see* Lexical Integrity lexicon, 61, 125–167, 784<sup>14</sup> Lexicon-Grammar, 779 liaison, 739 linear logic, 1428 linear precedence, 371 Linear Precedence Rule, 372 linearization domain, *see* order domain linearization-based HPSG, 52, 1355– 1357 linguistic hypothesis testing, 1115, 1120–1124 linguistic knowledge, 1157–1158, 1177 Linguistic Knowledge Builder (LKB), 62, 65, 959, 1115, 1117 linguistic universal bottom-up exploration, 1115 link, 1162 linking, 145, 160, 316, 319, 327–331, 1121 meaning, 327–330, 332–333 passives, 346 lip-pointing, 1209 list, 102, 1109 local management system, 1181 locality, 69, 784–785, 1266 Strong Locality Hypothesis, 784 Strong Non-locality Hypothesis, 784 Weak Locality Hypothesis, 785 Weak Non-locality Hypothesis, 785 locutionary proposition, 1175 logic instruction, 66 logical form, 1125, 1170 logical variable, 1112

London-Lund corpus, 1165 machine learning, 1110–1112, 1119, 1122, 1125–1128 machine translation, 62, 63, 66, 790<sup>21</sup> , 1126 macro, 1115 Mainstream Generative Grammar (MGG), 1501 maintainability of grammars, 1106 mal-rule, 1125 manifest field, 1174 mapping, 316, *see* linking McGraw-Hill Education, 66 McGurk effect, 1087 meaning constructors, 1428 meaning language, 1428 medium, 1156 mental states, 1166 Merge, 5, 1265, 1501 External, 1265 Internal, 1266 meta-grammar, 1117, 1121, 1130 Metagraph Grammar, 1273<sup>19</sup> metaphorical gesture, 1211 metarule, 55, 56 metonymy, 232 metrical tree, 1162 Micro-Cues, 1504 mildly context sensitive, 51 Minimal Recursion Semantics (MRS), 13, 19, 62, 339, 611<sup>22</sup> , 638–639, 788, 1019–1026, 1112, 1114, 1171, 1221, 1428, 1520 Minimalism, 5, 14, 21, 67, 191<sup>23</sup> , 220, 322, 379<sup>5</sup> , 570, 646<sup>66</sup> , 740,

1106, 1253–1306, 1501, 1521, 1530, 1532 Minimalist Grammars, 1503<sup>6</sup> mixed syntax, 1202 modal, 1125 modal transparency, 336–337 model theory, 1169–1172 model-theoretic semantics, 1169– 1172 modularity, 988–990, 1085<sup>5</sup> , 1090, 1282 modus ponens, 1336 morpheme, 1122 morphological analyzer, 1118 Morphological Blocking, *see* Pāṇini's Principle morphology, 130–134, 1115, 1122, 1273, 1300 A-Morphous, 958, 961 derivational, 157, 1523–1524, 1527 inflectional, 162 internal structure of words, 131 irregular morphology, 150, 151 Paradigm Function, 958 suppletion, 163 Morphosyntactic Alignment Principle, 341 morphotactics, 1122 Move, 1266–1267 movement, 5 A, 1266 A 0 , 1266 head, 1266 movement trajectory, 1215 multichart, 1213 multilogue, 1182 multimodal chart parser, 1211, 1213

multimodal communication, 1202 multimodal ensemble, 1228 multimodal grammar, 1217 multimodal integration, 1212 multimodal integration scheme, 1213 multimodal therapy, 1232 multimodal utterance, 1206 multiple fronting, 1068 multiple inheritance, 61 multiword expression, 63, 777, 1112, *see* idiom mutual beliefs, 1163, 1164, 1166 named entity recognizer, 1118 naming game, 1217 Natural Language Logic, 65 Natural Language Processing (NLP), 1125–1128 negation, 33<sup>41</sup> , 352, 522, 811–839, 976, 1027, 1032, 1125–1126, 1450<sup>1</sup> negative concord, 1027 NEQ, 1114 new information, 1161–1162, 1165 non-cancellation, 1423 non-configurationality, 187 non-sentential utterances, 1157 non-verbal actions, 1157 non-verbal behaviour, 1201–1202 non-verbal communication, 1182 Nonlocal Feature Principle, 59 nonsentential utterances, 848 Notion Rule, 327 noun, 15, 1129 NPN Construction, 1271, 1290, 1530 numeration, 1264, 1300<sup>40</sup> , 1303, 1504 o-bind, 897

o-command, 322, 324, 897, *see* anaphoric binding local, 897 o-free, 897 object, 322 direct, 895<sup>11</sup> indirect, 895<sup>11</sup> primary, 895 second, 895 obligatory gesture, 1207 oblique argument, 318, 322, 331–332, 340, 341 obliqueness, 58, 61, 138, 139, 255, 322, 354, 379, 895, 897, 1286, 1369 observer viewpoint, 1226 obviation, 321 off-path constraints, 1420<sup>16</sup> online type construction, *see* type underspecified hierarchical lexicon, TUHL ontology acquisition, 66 open hand pointing, 1209 open-source, 1116 opinion analysis, 1126 Optional Quantifier Merger, 1362 order domain, 133, 390–402, 636, 749, 1162, 1423 pantomime, 1211 Paradigm Function Morphology, 1355<sup>19</sup> paralinguistics, 1201 parameter, 379 parasitic gap, *see* gap parasitic scope, 1349 ParGram, 1122 Parole, 1118 parse forest, 1111, 1117

parse ranking, 1110–1112, 1117, 1118 parse selection, 66 parsing, 1106, 1107, 1117, 1119, 1126 chart parsing, 1109 parsing algorithm, 1108 part-of-speech tagging, 1111<sup>13</sup> partial verb phrase fronting, 401, 458, 1304, 1524–1525 participant role, *see* semantic role passive, 57, 126, 148, 336, 344–351, 550<sup>12</sup> , 891<sup>5</sup> , 896 impersonal, 1293 remote, 390 pending, 1176 Penn Treebank, 1119 performance, 1081, 1157, 1263, 1305 Performance–Grammar Correspondence Hypothesis, 1082 periphery, 1106, 1129, 1256, 1501 PET, 1117 Phase, 1300, 1303, 1505 phenogrammar, 1356 phonetic speech–gesture constraint, 1205 phonology, 10, 14, 787, 969, 1049, 1271, 1276, 1300, 1518 phrasal lexical entry, 791 phrase structure rule, 1107<sup>3</sup> phraseme, *see* idiom phraseological pattern, 796 phraseological unit, *see* idiom phraseologism, *see* idiom pied-piping, 598, 1268, 1360, *see* relative inheritance pointing, 1209 pointing cone, 1218 pointing gesture, 1216–1221 polar question, 1181

polysynthetic, 187 position class, 1122 possible worlds, 1172 Post-Auxiliary Ellipsis, 849–850 post-stroke hold, 1211 Poverty of the Stimulus, 1505 pragmatics, 893 pre-stroke hold, 1211 precision, 1126, 1129 preconditions, conversational rule, 1177 predicate, 1114 abstract, 1113 lexical, 1113 secondary, 641 depictive, 896 resultative, 351–353, 357 predicate-argument structure, 1112, 1113, 1127 Predictability Hypothesis, 791 preparation phase, 1211 principle Agreement Marking, 235 Argument Realization (ARP), 17, 182, 319, 321, 427, 429, 865 Binding Principle A, 138, 224, 898, 908, 1368, 1369, 1373 Principle B, 898, 908, 1368, 1370, 1372 Principle C, 237<sup>7</sup> , 898, 908, 1368, 1373 Principle Z, 908 Binding Inheritance (BIP), *see* Nonlocal Feature Principle Blocking, 964 Case, 251, 836 Constituent Order, 377

DGB-Params, 1185 Extended Focus Projection, 1053, 1062 Focus Inheritance, 1045, 1055 Head Feature, 22, 61, 96, 105, 135, 144, 224, 279, 280, 285, 1185, 1398, 1402, 1516–1517 Generalized, 23, 25, 550, 560 LRS Projection, 1027–1028 Morph Ordering, 969 Nonlocal Feature, 551, 568, 574 of Canonicality, 183 Phon, 1185 Pāṇini's, 971–979 Raising, 500 Semantics, 224, 234, 1004, 1010– 1012, 1019, 1023, 1029, 1034 Sign, 34, 583, 1515<sup>19</sup> SLASH Amalgamation, 549, 556, 626 SLASH Inheritance, 550 Subcategorization, 61, 318, 1352<sup>17</sup> , 1514 Trace, 609<sup>18</sup> Valence, 23, 25, 1401, 1514 *Wh*-Inheritance, 613<sup>25</sup> Word, 97, 784<sup>14</sup> Principle of Completeness, 1414 Principles & Parameters, 1504 private information, 1166 PRO, 617, 618, 632, 633 *pro*-drop, 184<sup>13</sup> , 222 pro-index, 181 pro-speech gesture, 1202 probe, 1265 processing, 1157 processing speed, 1156 processing time, 1118

production, 1085, 1087, 1088, 1090, 1263 Project DeepThought, 1116, 1121 projection, 1227 pronoun, 1159 expletive, 61, 898, 1293 first person, 1160 proof normalization, 1347 proper name, 1160, 1163 proposition, 1175 prosody, 1070–1073, 1086, 1162, 1206 proto-role, 327, 330, *see* semantic role proverb, 781 proxemics, 1202 pruning, 1111 PTT, 1182 public information, 1166 PyDelphin, 1114, 1117 Pāṇini's Principle, *see* principle Pāṇini's Principle, 957, 963

QEQ, 1113, 1114 quantificational parameters, 1175 quantified noun phrases, 1172 quantifier, 1113, 1114 querying relational databases, 65 question answering, 66 question under discussion, 1165, 1166, 1175, 1209 question-answering pair, 1165 QuickSet system, 1213 raising, 61, 897, 920, 1109, 1129, 1266 rational clause, 336

realization ranking, 1118 recall, 1126 record, 1172, 1173 record type, 1172, 1173

recursion, 1109 Redbird Advanced Learning, 66 reduplication, 51, 1290 Redwoods, 1123, 1124, 1129 reference, 1163 reference resolution, 1128 referential interpretation, 1175 referential NPs, 1163 referring expression, 890 regulator gesture, 1208 relation *append*, 101, 112, 115, 375<sup>30</sup> *member*, 101 *o-command*, 101 *shuffle*, 17<sup>30</sup> , 101, 182<sup>30</sup> , 391 relational constraint, 375<sup>2</sup> , 376, 379, 1115 Relational Grammar, 322, 345 relative clause, 62, 595–648, 787, 1094–1096, 1255, 1277–1278 antecedent of, 596, 598–600, 605–607, 611, 613–616, 620, 623, 634–637, 640<sup>57</sup> , 643 appositive, *see* relative clause, non-restrictive as complement in cleft construction, 640–641 in dependent noun construction, 641–642 in pseudo-relative construction, 642–643 of *diejenigen*, 640 of superlative adjective, 640 bare, 626–633 in English, 629–633 in Japanese, 627–629 in Korean, 627–629 construction, 609–618, 620–622,

626–633, 636 *dont*-relative, 622–626 empty relativizer, 607–609, 634 extraposition of, 637–639 free, 256, 257, 263, 559–560, 643–647, 896, 912 fused relative, *see* relative clause, free headed by complementizer, 618–626 in Arabic, 619–620 in English, 620–622 in French, 622–626 headed by verb, 627–629 headless relative, *see* relative clause, free hydra, 611<sup>22</sup> infinitival, *see* relative clause, non-finite internally headed, 627<sup>40</sup> , 642 matching analysis of, 615 non-finite, 617–618 non-restrictive, 633–636 predicative, 641 R, RP, *see* relative clause, empty relativiser raising analysis of, 615, 616<sup>28</sup> reduced, 632–633 relative-correlative, 639<sup>55</sup> relativised constituent, 597, 615, 623–625 subject relative, 609<sup>18</sup> , 610<sup>20</sup> supplemental, *see* relative clause, non-restrictive *that*-less, *see* relative clause, bare transparent free, 646–647 *wh*-phrase in, 596–618, 620, 630,

634, 643–646 *wh*-relative, 596–615, 617–618, 619<sup>31</sup> , 620 relative inheritance, 598–605, 609, 612<sup>24</sup> , 613, 615, 633, 642<sup>60</sup> , 644–646 relative percolation, 598, *see* relative inheritance relative pronoun, 596–600, 604–606, 608, 613, 616, 618–621, 626, 630<sup>43</sup> , 633–636, 639<sup>55</sup> , 641, 643–645, 648 relativization, 896 Relativized Minimality, 495 remnant movement, 1304 representational gesture, 1208, 1221– 1228 reprise content hypothesis, 1168 reprise fragment, 1167–1169 resemblance, 1208 resultative, *see* predicate resumptive pronoun, 539, 616, 620, 623–626 retraction phase, 1211 Right Node Raising, 70, 752, 756, 760–763, 873–876, 1339 Right Roof Constraint, 573, 639, 678– 680 robot control, 66 Robust Minimal Recursion Semantics, 1221 robust parsing, 1126 robust processing, 1117 rule-to-rule approach, 1001 SaGA, 1203<sup>1</sup> SaGA Corpus, 1230 schema, 372, 1401 Argument-Cluster, 757

Big Mess, 303 Comparative Correlative, 739– 740 Coordination, 670, 732, 756, 877 Filler-Head, 24, 542, 543, 612 free-relative, 560 Head-LIGHT, *see* schema, Head-Cluster Head-Adjunct, 374, 672 Head-All-Valents, 198 Head-Cluster, 443, 457, 468, 829 Head-Complement, 376, 379, 456, 1286 Head-Complements, 22, 441 Head-Extra, 574 Head-Fragment, 862, 873 Head-Functor, 734 Head-Relative, 613 Head-Subject-Complements, 466 Metonymy, 234 NPN, 1532 Right Peripheral Ellipsis, 761 Specifier-Head, 381 Subject Head, 24 scope, 379, 1112–1114, 1294–1296 of negation, 1126 of quantificational NPs, 61 scrambling, 199, 370–373, 375, 379, 390, 394, 402–403, 896, 1293–1296 segmentation, 1112 Segmented Discourse Representation Theory, 1182 semantic representation, 1112, 1117, 1118, 1125, 1169–1172 variable-free, 1112 semantic role, 315, 319, 324, 327, 331,

333–334, 337, 341, 1114 semantic type, 1169 semantics, 10, 327, 475–479, 1001– 1035, 1271, 1273, 1300 sembank, 1122–1124 semiotics, 1202 sentence chunking, 1118 sentence length, 1108 Sentential Subject Constraint, 684 serialization, 896 Shakespeare, 1204 shallow processing, 1116 ShapeWorld, 1128 shared assumptions, 1163 shared attention, 1157 shared information gesture, 1228 shared knowledge, 1157 Shared Task on Extrinsic Parser Evaluation, 1126 shuffle, 102, 159, 353, 1106 sign, 1175 sign formation, 1228 sign language, 1211 Sign-Based Construction Grammar (SBCG), *see* Construction Grammar signature, 100, 143, 754 similarity, 1208, 1222 Simpler Syntax, 68 singleton type, 1174 situated communication, 1157 Situated Prosodic Word Constraint, 1220, 1225 situatedness, 1156 situation, 1173, 1176 Situation Semantics, 1002, 1160, 1176 situation theory, 1170 situation type, 1176

sketch model, 1232 Slash Elimination, 1344 Slash Introduction, 1344 slingshot argument, 1172 sluicing, 848 SmartKom Corpus, 1230 software, 1114–1122, 1124<sup>32</sup> some, 1428 sort, 8 parametric, 113 source language, 1126 span, 1109 speaker, 1160 specificational pseudo-cleft, 646– 647 speculation, 1125 speech, 1299<sup>38</sup> speech–gesture ensemble, 1222, 1228 speech–gesture integration, 1212, 1217 speech–gesture production models, 1202, 1232 split ergativity, 194 Spoken Dutch Corpus, 1118 spoken language, 1158 standard, 1163 standard meaning, 1163 standardization, 1156 state-of-affairs, 19 statistical model, 66, 1110–1112, 1119, 1122, 1125–1128 statistics, 63 string, 1166 string theory of events, 1166 stroke phase, 1211 Stroop effect, 1087 structural priming, 1089, 1091

structure sharing, 1214 structured content, 1172 subjacency, 677–678 Subjacency, 1305 subject, 322, 895, 1109 successive cyclicity, 1420<sup>16</sup> summarization, 1126 abstractive, 1126 extractive, 1126 supertagging, 1111, 1118 surface-oriented, 1089 suspended affixation, 131 symmetrical object, 325 synchrony, 1202 syntactic category, 1335 syntactic type, 1335 syntax, 1271, 1300 taboo, 1228 taboo gesture, 1228 tacesics, 1202 tail, 1162 target language, 1126 tectogrammar, 1356 temporal relationship, 1214 tense, 1160 test suite, 1110, 1116 thematic hierarchy, 324, 332, 333 thematic role, *see* semantic role thumb pointing, 1209 thumbs-up gesture, 1209 Tibidabo Treebank, 1123<sup>30</sup> token, 1108, 1111, 1113, 1172 tokenization, 1112 Top-down Phase-based Minimalist Grammar (TPMG), 1503<sup>6</sup> topic, 1161, 1267, 1280, 1414, 1418 topic drop, 388, 896 topological field, 133, 398

TRALE, 66, 1115, 1117 transfer grammar, 1126 Transformational Grammar, 5, 49, 779, 1265<sup>12</sup> transparency, 780 Tree Adjoining Grammar (TAG), 1108, 1112, 1501, 1527 treebank, 66, 1106<sup>1</sup> , 1111, 1112, 1117, 1119, 1122–1124, 1129 searchable, 1123 truth-conditional difference, 1161 truth-conditional semantics, 1128, 1170 TUHL, *see* Type Underspecified Hierarchical Lexicon Turing-complete, 1108, 1109 turn management, 1207 turn-assigning gesture, 1228 turn-taking, 1157 two-dimensional theory of idioms, 791 type, 8, 1171, 1172, 1277 *arc*, 1216 *atomic*, 12 *case*, 249, 258 *clause*, 610 *comp*, 617<sup>30</sup> *core-cl*, 610 *decl-cl*, 610 *diff-list*, 1047 *fact*, 606<sup>30</sup> , 606, 608, 609, 614<sup>30</sup> , 614, 619<sup>30</sup> *fin-wh-rel-cl*, 610–612, 635 *hd-relative-mod-phrase*, 628 *head-adjunct-phrase*, 614<sup>30</sup> , 614, 648 *head-comp-cx*, 1514, 1516 *head-complement-phrase*, 22,

1513 *head-filler-phrase*, 24, 611, 612, 617 *head-nexus-phrase*, 612 *head-relative-mod*, 628 *head-relative-phrase*, 613, 614 *head-subject-phrase*, 24, 641 *headed-phrase*, 391 *imp-cl*, 610 *inf-head-filler-phrase*, 618 *inf-head-filler-rel-cl*, 618 *inf-wh-rel-cl*, 610, 618 *info-str*, 1047 *lexeme*, 18 *line*, 1216 *list*, 1109<sup>30</sup> *local*, 623<sup>30</sup> *located\_command*, 1214 *message*, 606<sup>30</sup> *naming*, 601<sup>30</sup> *nom-object*, 606<sup>30</sup> *non-empty-list*, 1109<sup>30</sup> *non-finite*, 618 *non-wh-rel-cl*, 610, 630 *noun*, 15, 597, 600, 607, 611, 613, 628, 633, 635 *nprl*, 623<sup>30</sup> , 623 *null*, 1109<sup>30</sup> *outcome*, 606<sup>30</sup> *parameter*, 600<sup>30</sup> *phrase*, 27 *prl*, 623<sup>30</sup> , 625 *proposition*, 606<sup>30</sup> , 610<sup>30</sup> , 610, 611, 614<sup>30</sup> *psoa*, 606<sup>30</sup> *question*, 606<sup>30</sup> *red-rel-cl*, 610, 632 *rel-cl*, 610, 614, 622, 630, 635

*scope-object*, 600<sup>30</sup> , 606<sup>30</sup> , 606– 608, 614 *simp-inf-rel-cl*, 610, 631<sup>30</sup> *soa*, 601, 606 *state-of-affairs*, 601, 606<sup>30</sup> , 606, 607, 610<sup>30</sup> *that*, 621 *unmarked*, 621 *v-mod*, 628<sup>30</sup> , 628 *verb*, 15, 597, 607, 617<sup>30</sup> , 619<sup>30</sup> *verbal*, 612, 617<sup>30</sup> *wh-rel-cl*, 610, 611 *which*, 636 *word*, 613<sup>30</sup> type hierarchy, 8, 64, 94, 296, 1106 lexical, 18, 141–155, 1091 Type Raising, 1337, 1338 type shifting, 1284<sup>27</sup> type theory, 1172–1175 Type Theory with Records, 1172– 1175 type underspecification, 161–165 Type Underspecified Hierarchical Lexicon (TUHL), 163 Type-Logical Categorial Grammar (TLCG), 1332 Type-Logical Grammar, 754 type-theoretical semantics, 1170– 1175 TypeGram, 1122<sup>28</sup> typological property, 1122 ubertagging, 1112, 1118 unbounded dependency, 49, 55, 61, 65, 158, 596–648, 1266

strong, 620 weak, 620 underspecification, 92, 1092, 1107<sup>4</sup> unification, 90, 1168, 1510

Uniformity of Theta Assignment Hypothesis (UTAH), 322, 332, 1271, 1273 Universal Dependencies (UD), 1119, 1123 Universal Grammar (UG), 202, 1256, 1283, 1504 universe, 1169 update, 1165, 1166 usage-based grammar, 790 V2 order, *see* word order valence, 11, 160, 1111, 1118, 1129 list, 1109 vector space model, 1128 verb, 15, 1129 of speculation, 1125 unaccusative, 435<sup>10</sup> , 1066 verb second, *see* word order Verb*mobil*, 62, 63, 65, 66, 1019, 1116 vertical slash, 1348 Vertical Slash Elimination, 1348 Vertical Slash Introduction, 1348 visual question answering (VQA), 1128 visual situation, 1175 voice active, 918 agentive, 323 objective, 323, 918 *Vorfeldellipse*, *see* topic drop Wasow's Generalization, 727 WeScience, 1124 WeSearch, 1123<sup>29</sup> *wh*-percolation, 598, *see* relative inheritance

Wikipedia, 1124

WikiWoods, 1124, 1128

witness, 1173, 1175 witness set, 1172 Word Grammar, 1459–1486, 1533<sup>29</sup> word order, 1122 SOV, 380–383, 384<sup>8</sup> , 1292 SVO, 380–383, 384<sup>8</sup> , 1292 V2, 370, 383, 384<sup>8</sup> , 398–401, 780, 1068–1070, 1120, 1508 verb-penultimate, 1120 VOS, 1063 VSO, 195–202 word retrieval, 1207 word with spaces, 779 Word-and-Paradigm, 958 world knowledge, 1085, 1087, 1300 wrapping, 51 written language bias, 1157

X theory, 1304

YADU, 150 YY Software, 63

# HeadDriven Phrase Structure Grammar

Head-Driven Phrase Structure Grammar (HPSG) is a linguistic framework that models linguistic knowledge on all descriptive levels (phonology, morphology, syntax, semantics, pragmatics) by using feature value pairs, structure sharing, and relational constraints. This volume summarizes work that has been done since the mid 80s. Various chapters discus formal foundations and basic assumptions, describe the evolution of the framework and go into the details of various syntactic phenomena. Separate chapters are devoted to non-syntactic levels of description. The book also handles related fields and research areas (gesture, sign languages, computational linguistics) and has a part in which HPSG is compared to other frameworks (Lexical Functional Grammar, Categorial Grammar, Construction Grammar, Dependency Grammar and Minimalim).